Efficiently load large batch of OneToMany collections at once with Hibernate

Efficiently load large batch of OneToMany collections at once with Hibernate - java

I have a Parent entity with a #OneToMany relationship with a Child entity. Most of the time, when I need to work with a Parent’s Child entities, I’m working with a single parent, so lazy fetching (FetchMode.SELECT) is appropriate.
However, I have a situation where I’m querying a large number of Parents (sometimes hundreds or even thousands), and I need to work with their Child entities. FetchMode.SELECT gives me a serious N+1 problem, so I need to do something different in this scenario. If I were doing this via JDBC, it’d be single query for the Parent records, then another query for all the Child records using an IN statement (where child.parentid in (?,?,?....)). I need live Hibernate entities, because Hibernate Search is going to call getChildren() as part of its indexing process.
The options I’ve considered are:
Criteria.setFetchMode(“children”, FetchMode.JOIN) (or join fetch in HQL) - this would give me a cartesian product, though, which is brutal with that many entities.
Adding #BatchSize to Parent.getChildren() - this would help for my big batch scenario, but it isn’t really the strategy I want to use for normal operations. It’d be perfect if I could set a batch size for the fetch in my Criteria/HQL, but I can’t find a way to do so.
Using FetchMode.SUBSELECT in Parent.getChildren() - much like #BatchSize, this would be great for my big batch scenario, but isn’t appropriate for normal operations, and I can’t find a way to use it with Criteria/HQL (Criteria and the entity annotations use different FetchMode enums, despite the duplicate name).
tldr; I have a one-to-many relationship with a lazy fetch mode, but sometimes I want to be able to efficiently load the relationship for many entities at once.

Related

In Hibernate, how to avoid the N+1 query problem and large results sets due to multiple joins

I am using Spring Boot and Hibernate.
Some complex logic, dictated by business, needs to use various nested fields, which traverse various DB relationships (again, some are NxN, Nx1, 1xN, 1x1).
I encountered the N+1 problem, and I solved it at first with HQL, but some queries need several joins and the result sets become unmanageable.
I started working on a custom utility that collects the ids of things that need to be fetched, fetches them all at once and uses the setters to then populate the fields on the starting objects. This utility works for ManyToOne relationships, but is still inefficient with ManyToMany relationships, because it falls back in the N+1 problem when I collect the ids (as it queries the join table once per object via the getter).
How can I solve this? Has this problem really not been solved yet? Am I missing some obvious settings that solves this automagically?
EDIT:
I made a toy example with some commentary: https://github.com/marcotama/n-1-queries-example

I had faced the same situation and I had 3 ways to solve it;
increase the fetchsize for the dependent attribute so that the queries are executed in batch
write a custom query for the purpose
define entity graph relations and map accordingly to attributes
I personally preferred the 3rd option as it was convenient to do that and was cleaner with spring data JPA.
you can refer to examples from the comments from the below answers:
Spring Data JPA And NamedEntityGraphs
What is the solution for the N+1 issue in JPA and Hibernate?

Write fetch logic on your own.
E.g You have author which has book, author_devices
You can join fetch author with books. Than you can separatly fetch author_devices using repository "where author_id IN (authorsList.stream().map(author.getId())". Than you should detach author and iterate author_devices and assign it to apropriate author devices list. I think it's only adequate solution for situations where you need to join-fetch more than 1 relation.

JPA: When do eager fetching result in StackOverflowError?

Do we have some set of rules when we should abstain of using FetchType.EAGER?
I have heard that some of JPA frameworks (eg. Hibernate) should resolve cyclic dependencies.
Do we have some advice when it comes risky to relay on frameworks?
Do StackOverflowError along with FetchType.EAGER is always about bug in framework(I mean if I have like 3 rows in two tables)?

One good reason to avoid FetchType.EAGER is that you can always enable eager fetching manually in JPQL (with JOIN FETCH), but you can't enable lazy fetching if you've set FetchType.EAGER.

EAGER:
Convenient, but slow
Use Eager when your parent class always need associate class.
LAZY:
More coding, but much more efficient
Basically lazy loading has more benefits than the eager alternative (performance, use of resources) . you should generally not configure it for eager fetching, unless you experience certain issues.
Sharing a domain instance accross different hibernate sessions (eg. when putting a domain class instance into the http session scope and accessing properties from it - such as a User)
When you're sure, you will access a certain relation property everytime (or most of the time) when an instance is fetched, it would also make sense to configure this relation for eager fetching.
eg: Relation Person and Address
Here, its not necessary that when i load Person entity i need to fetch Address all the time so you can keep lazy .
Conclusion
The EAGER fetching strategy is a code smell. Most often it’s used for simplicity sake without considering the long-term performance penalties. The fetching strategy should never be the entity mapping responsibility. Each business use case has different entity load requirements and therefore the fetching strategy should be delegated to each individual query.
The global fetch plan should only define LAZY associations, which are fetched on a per query basis. Combined with the always check generated queries strategy, the query based fetch plans can improve application performance and reduce maintaining costs.

HQL joined query to eager fetch a large number of relationships

My project has recently discovered that Hibernate can take multiple levels of relationship and eager fetch them in a single join HQL to produce the filled object we need. We love this feature, figuring it would outperform a lazy fetch circumstance.
Problem is, we hit a situation where a single parent has about a dozen direct relationships, an a few subrelationships off of that, and a few of them have several dozen rows in a few instances. The result is a pretty large cross-product that results in the hql spinning it's wheels virtually forever. We turned logging up to 11 and saw more than 100000 iterations before we gave up and killed it.
So clearly, while this technique is great for some situations, it has limits like everything in life. But what is the best performing alternative in hibernate for this? We don't want to lazy-load these, because we'll get into an N+1 situation that will be even worse.
I'd ideally like to have Hibernate pre-fetch all the rows and details, but do it one relationship at a time, and then hydrate the right detail object to the right parent, but I have no idea if it does such a thing.
Suggestions?
UPDATE:
So we got the SQL this query generated, it turns out that I misdiagnosed the problem. The cross product is NOT that huge. We ran the same query in our database directly and got 500 rows returned in just over a second.
Yet we saw very clearly in the hibernate logging it making 100K iterations. Is it possible Hibernate can get caught in a loop in your relationships or something?
Or maybe this should be asked as a new question?

Our team uses the special strategy to work with associations. Collections are lazy, single relations are lazy too, except references with simply structure (for an example a countries reference). And we use fluent-hibernate to load what we need in a concrete situation. It is simply because of fluent-hibernate supports nested projections. You can refer this unit test to see how complex object net can be partially loaded. A code snippet from the unit test
List<Root> roots = H.<Root> request(Root.class).proj(Root.ROOT_NAME)
.innerJoin("stationarFrom.stationar", "stationar")
.proj("stationar.name", "stationarFrom.stationar.name")
.eq(Root.ROOT_NAME, rootName).transform(Root.class).list();
See also
How to transform a flat result set using Hibernate

Custom hibernate entity persister

I am in the process of performance testing/optimizing a project that maps
a document <--> Java object tree <--> mysql database
The document, Java classes, database schema and logic for mapping is orchestrated with HyperJaxb3. The ORM piece of it is JPA provided by hibernate.
There are about 50 different entities and obviously lots of relationships between them. A major feature of the application is to load the documents and then reorganize the data into new documents; all the pieces of each incoming document eventually gets sent out in one outgoing document. While I would prefer to not be living in the relational world, the transactional semantics are a very good fit for this application - there is a lot of money and government regulation involved, so we need to make sure everything gets delivered exactly once.
Functionally, everything is going well and performance is decent (after a fair amount of tweaking). Each document is made up of a few thousand entities which end up creating a few thousand rows in the database. The documents vary in size, and insert performance is pretty much proportional to the number of rows that need to be inserted (no surprise there).
I see the potential for a significant optimization, and this is where my question lies.
Each document is mapped to a tree of entities. The "leaf" half of the tree contains lots of detailed information that is not used in the decisions for how to generate the outgoing documents. In other words, I don't need to be able to query/filter by the contents of many of the tables.
I would like to map the appropriate entity sub-trees to blobs, and thus save the overhead of inserting/updating/indexing the majority of the rows I am currently handling the usual way.
It seems that my best bet is to implement a custom EntityPersister and associate it with the appropriate entities. Is this the right way to go? The hibernate docs are not bad, but it is a fairly complex class that needs to be implemented and I am left with lots of questions after looking at the javadoc. Can you point me to a concrete, yet simple example that I can use as a starting point?
Any thoughts about another way to approach this optimization?

I've run in to the same problem with storing large amounts of binary data. The solution I found worked best is a denormalization of the Object model. For example, I create a master record, and then I create a second object that holds the binary data. On the master, use the #OneToOne mapping to the secondary object, but mark the association as lazy. Now the data will only be loaded if you need it.
The one thing that might slow you down is the outer join that hibernate performs with all objects of this type. To avoid it, you can mark the object as mandatory. But if the database doesn't give you a huge performance hit, I suggest you leave it alone. I found that Hibernate has a tendency to load the binary data immediately if I tried to get a regular join.
Finally, if you need to retrieve a lot of the binary data in a single SQL call, use the HQL fetch join command. For example: from Article a fetch join a.data where a.data is the one-to-one relationship to the binary holder. The HQL compiler will see this as an instruction to get all the data in a single sql call.
HTH

Hibernate - Limit eager loading collections to one table deep

I have for example, a DB with the following entity and relation structure:
[Person] has many [Skills], [Skills] has many [Actions]
In the .hbm.xml's, I assign one-to-many relations for person > skills, skills > actions.
In a query, I would like to be able to control when I query on Person, to eager load only the Skills. Currently, I seem to be stuck where i'm eager loading NOTHING, and generating a n+1 amount of queries to get a Person's Skills, or am generating an (n*n+1) amount of queries as it eagerly loads the entire Person > Skills > Actions collection hierarchy.
How would I limit it such that I can control when I do and do not want to load the third depth table? For context, I'd be able to live with always having Person > Skills collection initialized, ideally as a JOIN to prevent n+1 performance bottlenecks.

I use the following practice:
I try to avoid child collections on objects, especially if a child collection could have a lot of entries. If I need to get the children, I use a query to get them.
If I do have child collections, I always set collections to be lazy loaded.
For 'querying', using the Criteria API, I have a class that creates the query, executes it and returns the result. As part of building the query, I use root.fetch(Person_.skills); where root is javax.persistence.criteria.Root<Person> to eagerly load the collections I want.

It's a little bit off-topic, but you might consider to use some implementation of Graph Database to maintain data of such level of complexity instead of RDBMS and Hiberante. See neo4j, the graph database , which allows to create nodes (in you case persons, skills) and relations between them (extends, knows). So you will be able to easily traversal a data at any level of deep.

This turned out to be pretty easy to control at runtime.
In my .hbm.xml's i continued to declare my association sets as lazy, (even extra lazy!).
In the HQL query, I query like:
Select distinct p from Person
left join fetch p.skills
the fetch keyword forces eager loading for that particular join.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.