I have for example, a DB with the following entity and relation structure:
[Person] has many [Skills], [Skills] has many [Actions]
In the .hbm.xml's, I assign one-to-many relations for person > skills, skills > actions.
In a query, I would like to be able to control when I query on Person, to eager load only the Skills. Currently, I seem to be stuck where i'm eager loading NOTHING, and generating a n+1 amount of queries to get a Person's Skills, or am generating an (n*n+1) amount of queries as it eagerly loads the entire Person > Skills > Actions collection hierarchy.
How would I limit it such that I can control when I do and do not want to load the third depth table? For context, I'd be able to live with always having Person > Skills collection initialized, ideally as a JOIN to prevent n+1 performance bottlenecks.
I use the following practice:
I try to avoid child collections on objects, especially if a child collection could have a lot of entries. If I need to get the children, I use a query to get them.
If I do have child collections, I always set collections to be lazy loaded.
For 'querying', using the Criteria API, I have a class that creates the query, executes it and returns the result. As part of building the query, I use root.fetch(Person_.skills); where root is javax.persistence.criteria.Root<Person> to eagerly load the collections I want.
It's a little bit off-topic, but you might consider to use some implementation of Graph Database to maintain data of such level of complexity instead of RDBMS and Hiberante. See neo4j, the graph database , which allows to create nodes (in you case persons, skills) and relations between them (extends, knows). So you will be able to easily traversal a data at any level of deep.
This turned out to be pretty easy to control at runtime.
In my .hbm.xml's i continued to declare my association sets as lazy, (even extra lazy!).
In the HQL query, I query like:
Select distinct p from Person
left join fetch p.skills
the fetch keyword forces eager loading for that particular join.
Related
I am using Spring Boot and Hibernate.
Some complex logic, dictated by business, needs to use various nested fields, which traverse various DB relationships (again, some are NxN, Nx1, 1xN, 1x1).
I encountered the N+1 problem, and I solved it at first with HQL, but some queries need several joins and the result sets become unmanageable.
I started working on a custom utility that collects the ids of things that need to be fetched, fetches them all at once and uses the setters to then populate the fields on the starting objects. This utility works for ManyToOne relationships, but is still inefficient with ManyToMany relationships, because it falls back in the N+1 problem when I collect the ids (as it queries the join table once per object via the getter).
How can I solve this? Has this problem really not been solved yet? Am I missing some obvious settings that solves this automagically?
EDIT:
I made a toy example with some commentary: https://github.com/marcotama/n-1-queries-example
I had faced the same situation and I had 3 ways to solve it;
increase the fetchsize for the dependent attribute so that the queries are executed in batch
write a custom query for the purpose
define entity graph relations and map accordingly to attributes
I personally preferred the 3rd option as it was convenient to do that and was cleaner with spring data JPA.
you can refer to examples from the comments from the below answers:
Spring Data JPA And NamedEntityGraphs
What is the solution for the N+1 issue in JPA and Hibernate?
Write fetch logic on your own.
E.g You have author which has book, author_devices
You can join fetch author with books. Than you can separatly fetch author_devices using repository "where author_id IN (authorsList.stream().map(author.getId())". Than you should detach author and iterate author_devices and assign it to apropriate author devices list. I think it's only adequate solution for situations where you need to join-fetch more than 1 relation.
I have a Parent entity with a #OneToMany relationship with a Child entity. Most of the time, when I need to work with a Parent’s Child entities, I’m working with a single parent, so lazy fetching (FetchMode.SELECT) is appropriate.
However, I have a situation where I’m querying a large number of Parents (sometimes hundreds or even thousands), and I need to work with their Child entities. FetchMode.SELECT gives me a serious N+1 problem, so I need to do something different in this scenario. If I were doing this via JDBC, it’d be single query for the Parent records, then another query for all the Child records using an IN statement (where child.parentid in (?,?,?....)). I need live Hibernate entities, because Hibernate Search is going to call getChildren() as part of its indexing process.
The options I’ve considered are:
Criteria.setFetchMode(“children”, FetchMode.JOIN) (or join fetch in HQL) - this would give me a cartesian product, though, which is brutal with that many entities.
Adding #BatchSize to Parent.getChildren() - this would help for my big batch scenario, but it isn’t really the strategy I want to use for normal operations. It’d be perfect if I could set a batch size for the fetch in my Criteria/HQL, but I can’t find a way to do so.
Using FetchMode.SUBSELECT in Parent.getChildren() - much like #BatchSize, this would be great for my big batch scenario, but isn’t appropriate for normal operations, and I can’t find a way to use it with Criteria/HQL (Criteria and the entity annotations use different FetchMode enums, despite the duplicate name).
tldr; I have a one-to-many relationship with a lazy fetch mode, but sometimes I want to be able to efficiently load the relationship for many entities at once.
My project has recently discovered that Hibernate can take multiple levels of relationship and eager fetch them in a single join HQL to produce the filled object we need. We love this feature, figuring it would outperform a lazy fetch circumstance.
Problem is, we hit a situation where a single parent has about a dozen direct relationships, an a few subrelationships off of that, and a few of them have several dozen rows in a few instances. The result is a pretty large cross-product that results in the hql spinning it's wheels virtually forever. We turned logging up to 11 and saw more than 100000 iterations before we gave up and killed it.
So clearly, while this technique is great for some situations, it has limits like everything in life. But what is the best performing alternative in hibernate for this? We don't want to lazy-load these, because we'll get into an N+1 situation that will be even worse.
I'd ideally like to have Hibernate pre-fetch all the rows and details, but do it one relationship at a time, and then hydrate the right detail object to the right parent, but I have no idea if it does such a thing.
Suggestions?
UPDATE:
So we got the SQL this query generated, it turns out that I misdiagnosed the problem. The cross product is NOT that huge. We ran the same query in our database directly and got 500 rows returned in just over a second.
Yet we saw very clearly in the hibernate logging it making 100K iterations. Is it possible Hibernate can get caught in a loop in your relationships or something?
Or maybe this should be asked as a new question?
Our team uses the special strategy to work with associations. Collections are lazy, single relations are lazy too, except references with simply structure (for an example a countries reference). And we use fluent-hibernate to load what we need in a concrete situation. It is simply because of fluent-hibernate supports nested projections. You can refer this unit test to see how complex object net can be partially loaded. A code snippet from the unit test
List<Root> roots = H.<Root> request(Root.class).proj(Root.ROOT_NAME)
.innerJoin("stationarFrom.stationar", "stationar")
.proj("stationar.name", "stationarFrom.stationar.name")
.eq(Root.ROOT_NAME, rootName).transform(Root.class).list();
See also
How to transform a flat result set using Hibernate
The Hibernate documentation gives some information at #BatchSize as :
#BatchSize specifies a "batch size" for fetching instances of this
class by identifier. Not yet loaded instances are loaded batch-size at
a time (default 1).
I am not clear on what is the purpose of this annotation, when we need to use this. Can some please help me in understanding when to use this annotation.
Using batch fetching, Hibernate can load several uninitialized proxies if one proxy is accessed. Batch fetching is an optimization of the lazy select fetching strategy. There are two ways you can configure batch fetching: on the class level and the collection level.
Batch fetching for classes/entities is easier to understand. Consider the following example: at runtime you have 25 Cat instances loaded in a Session, and each Cat has a reference to its owner, a Person. The Person class is mapped with a proxy, lazy="true". If you now iterate through all cats and call getOwner() on each, Hibernate will, by default, execute 25 SELECT statements to retrieve the proxied owners. You can tune this behavior by specifying a batch-size in the mapping of Person:
<class name="Person" batch-size="10">...</class>
Hibernate will now execute only three queries: the pattern is 10, 10, 5.
#BatchSize will fetch record from db based on your size. Think you have 100 records in your result and if you are defined batch size as 10 then it will fetch 10 records per db call. Logically it will increase performance.
Let's assume that you know that for satisfying a certain request, you will have to select a specific Person from the database along with all of her Cats. As eager loading is considered code smell and mother of all evil, this association will be lazy - you still want to load the Person and all her Cat entities in one select statement.
You have several ways to do that: by using Named Entity Graphs or Dynamic Graphs, or specifying a "join fetch" in JPQL or Criteria API. Problem solved, you have an explicitly given eager loading select statement that loads everything in one step.
What if you want to query all Cats, Dogs, Mice, Lions and Parrots belonging to the Person? Assuming that all the different species are separate entities (having their own tables), you will end up with a query containing TOO MANY JOIN FETCH clauses to be effective. In this case, "eager loading" via the above way won't help.
But, with lazy loading, Hibernate will issue a select for each and every animal (creating the N+1 problem). This is the case where BatchSize can be a remedy. You can tell Hibernate to select multiple animals (of the same kind, probably) at once. For example, if you can guess how many animals (of the same kind) a Person typically has, you set that as a BatchSize for Hibernate, and you end up with one single query for each kind of animal.
If you want to see the queries (with the IN clause Hibernate uses for these batch queries), you can find a simple and short example on Thorben Janssen's site:
https://thorben-janssen.com/hibernate-tips-how-to-fetch-associations-in-batches/
I have a query that joins 5 tables.
Then I fill my hand-made object with the column values that I need.
What are solutions here that are wide-common to solve that problem using specific tools ? are there such tools?
I'm only beginning to learn Hibernate, so my question would be: is Hibernate the right decision for this problem?
Hibernate maps a table to a class. So, there's no difference if I would have 5 classes instead of 5 tables. It would still be difficult to join the query result into a class
Could hibernate be used to map THE QUERY into the structure (class) I would define beforehand as we do with table mapping? Or even better, can it map the query result into the meaningful fields [auto-create the class with fields] as it does with reverse-engineering?
I've been thinking about views but.. create a new view everytime we need a complex query.. too verbose.
As S.Lott asked, here is a simple version of a question:
General problem:
select A.field_a, B.field_b, C.field_c
from table_a A inner join table_b B inner join table_c C
where ...
every table contains 100 fields
query returns 3 fields, but every field belongs to the unique table
How do I solve that problem in an OO style?
Design a new object with properties corresponding to the returning values of the query.
I want to know if it is the right [and the only one possible] decision and are there any common solutions.
See also my comments.
The point of ORM is to Map Objects to Relations.
The point of ORM is -- explicitly -- not to sweat the details of a specific SQL join.
One fundamental guideline to understanding ORM is this.
SQL joins are a hack because SQL doesn't have proper navigation.
To do ORM design, we to intentionally set the SQL join considerations aside as (largely) irrelevant. Give up the old ways. It's okay, really. The SQL crutches aren't supporting us very well.
Step 1. Define the domain of discourse. The real-world objects.
Step 2. Define implementation classes that are a high-fidelity model of real-world things.
Step 3. Map the objects to relations. Here's where the hack-arounds start. SQL doesn't have a variety of collections -- it only has tables. SQL doesn't have subclasses, it only has tables. So you have to design a "good-enough" mapping between object classes and tables. Ideally, this is one-to-one. But in reality, it doesn't work out that way. Often you will have some denormalization to handle class hierarchies. Other than that, it should work out reasonably well.
Yes you have to add many-to-many association tables that have no object mapping.
Step 4. You're done. Write your application.
"But what about my query that joins 5 (or 3) tables but only takes one attribute from each table?"
What about it? One of those tables is the real object you're dealing with. The other of those 5 (or 3) tables are either part of 1-m nested collections, m-1 containers or m-m associations. That's just navigation among objects.
A 1-m nested collection is the kind of thing that SQL treats as a "result set". In ORM it will become a proper object collection.
A m-1 contain is the typical FK relationship. In ORM it's just a fetch of a related object through ordinary object navigation.
A m-m association is also an object collection. It's a strange collection because two objects are members of each other's collections, but it's just an object collection.
At no time do you design an object that matches a query. You design an object that matches the real world, map that to the database.
"What about performance?" It's only a problem when you subvert the ORM's simple mapping rules. Once in a blue moon you have to create a special-purpose view to handle really big batch-oriented joins among objects. But this is really rare. Often, rethinking your Java program's navigation patterns will improve performance.
Remember, ORM's cache your results. Each "navigation" may not be a complete "round-trip" to the database query. Some queries may be batched by the ORM for you.
There are a few options:
Create a single table mapping using <join> elements for the related tables. A join in that way will allow other tables to contribute properties to your class.
Use a database view as previously suggested.
Use a Hibernate mapping view - instead of <class name=... table=... you can use <class name=... select="select A.field_a, B.field_b, ... from A, B, ...">. It's essentially creating a view on the Hibernate side so the database doesn't have to change. The generated sql will end up looking like "select * from (select A.field_a, B.field_b from A, B, ...)
". I know that works in Oracle, DB2, and MySQL
All that is fine for selecting; if you need to do insert/update, you'll probably need to rethink your data model or your object model.
I think you could use the Criteria API in Hibernate to map the results of your join into your target class.