Understanding #BatchSize in Hibernate

Understanding #BatchSize in Hibernate - java

The Hibernate documentation gives some information at #BatchSize as :
#BatchSize specifies a "batch size" for fetching instances of this
class by identifier. Not yet loaded instances are loaded batch-size at
a time (default 1).
I am not clear on what is the purpose of this annotation, when we need to use this. Can some please help me in understanding when to use this annotation.

Using batch fetching, Hibernate can load several uninitialized proxies if one proxy is accessed. Batch fetching is an optimization of the lazy select fetching strategy. There are two ways you can configure batch fetching: on the class level and the collection level.
Batch fetching for classes/entities is easier to understand. Consider the following example: at runtime you have 25 Cat instances loaded in a Session, and each Cat has a reference to its owner, a Person. The Person class is mapped with a proxy, lazy="true". If you now iterate through all cats and call getOwner() on each, Hibernate will, by default, execute 25 SELECT statements to retrieve the proxied owners. You can tune this behavior by specifying a batch-size in the mapping of Person:
<class name="Person" batch-size="10">...</class>
Hibernate will now execute only three queries: the pattern is 10, 10, 5.

#BatchSize will fetch record from db based on your size. Think you have 100 records in your result and if you are defined batch size as 10 then it will fetch 10 records per db call. Logically it will increase performance.

Let's assume that you know that for satisfying a certain request, you will have to select a specific Person from the database along with all of her Cats. As eager loading is considered code smell and mother of all evil, this association will be lazy - you still want to load the Person and all her Cat entities in one select statement.
You have several ways to do that: by using Named Entity Graphs or Dynamic Graphs, or specifying a "join fetch" in JPQL or Criteria API. Problem solved, you have an explicitly given eager loading select statement that loads everything in one step.
What if you want to query all Cats, Dogs, Mice, Lions and Parrots belonging to the Person? Assuming that all the different species are separate entities (having their own tables), you will end up with a query containing TOO MANY JOIN FETCH clauses to be effective. In this case, "eager loading" via the above way won't help.
But, with lazy loading, Hibernate will issue a select for each and every animal (creating the N+1 problem). This is the case where BatchSize can be a remedy. You can tell Hibernate to select multiple animals (of the same kind, probably) at once. For example, if you can guess how many animals (of the same kind) a Person typically has, you set that as a BatchSize for Hibernate, and you end up with one single query for each kind of animal.
If you want to see the queries (with the IN clause Hibernate uses for these batch queries), you can find a simple and short example on Thorben Janssen's site:
https://thorben-janssen.com/hibernate-tips-how-to-fetch-associations-in-batches/

Related

Efficient Hibernate criteria for join returning many partial duplicates

I'm fetching a long list of entities which refer to others which refer to... and, at the end, usually of all them refer to a single user as their owner. Not really surprising as what's queried are entities belonging to a single user. There are more parts duplicated in many rows; actually, just a small percentage are unique data. As the query seems to be slow, I though I could gain a bit by fetching things separately using
criteria.setFetchMode(path, FetchMode.SELECT);
This works in my above case, but when querying over many users (as admin), it gets terrible, as hibernate issues a separate query for every user, instead of something like
SELECT * FROM User WHERE id IN (?, ?, ..., ?)
or not fetching them at all (which can't get any worse than one query per entity). I wonder what am I missing?
So instead of fetching a lot of redundant data, I ran into the 1+N problem, where obviously 1+1 queries would do.
Is there a way to instruct Hibernate to use the right query?
Is there a way to prevent Hibernate from fetching the owners by specifying it in the criteria itself (rather than putting fetch=FetchType.LAZY on the field; the laziness should be query-specific)?
I don't think it matters, but my classes are like
class Child {
#ManyToOne Father father;
#ManyToOne Mother mother;
...
}
class Father {
#ManyToOne User owner;
...
}
class Mother {
#ManyToOne User owner;
...
}
and the query is like
createCriteria(Child.class)
.add(Restrictions.in("id", idList))
.add(Restrictions.eq("isDeleted", false))
.createAlias("Father", "f")
.add(Restrictions.eq("f.isDeleted", false))
.setFetchMode("f.owner", FetchMode.SELECT)
.createAlias("Mother", "m")
.add(Restrictions.eq("m.isDeleted", false))
.setFetchMode("m.owner", FetchMode.SELECT)
.list();
The important part is that owner does not get used and can be proxied. The javadoc for FetchMode.SELECT says
Fetch eagerly, using a separate select
so it basically promises 1+1 querying which I want rather than "using a separate select per entity".

Fetch profiles are meant to help you achieve what you want, but are very limited at the time being and you can override the default fetch plan/strategy only with the join-style fetch profiles (you can make a lazy association eager, but not vice versa). However, you could use this trick to invert that behaviour: Make the association lazy by default and enable the profile by default for all sessions/transactions. Then disable the profile in transactions in which you want lazy loading.
IMHO the solution above looks too cumbersome, and the approach I use in most use cases to avoid both loading of redundant data and N+1 selects problem is to make associations lazy and define batch size.

unless the property is declared with
#ManyToOne(fetch=FetchType.LAZY), you can't change anything
True, at least for the time being, until fetch profile capabilities are extended to provide the ability to change eager loading to lazy.
the default is FetchType.EAGER, which is stupid, as it can't be
overridden
True, and I agree that it is bad, but in Hibernate native API everything is lazy by default; it is JPA that mandates to-one associations to be eager unless explicitly specified otherwise.
using criteria.setFetchMode(path, FetchMode.SELECT) is pointless as
it's always a no-op (either it gets ignored because of the
non-overridable eagerness of the property or the property is lazy
already)!
With it you should be able to override other lazy fetch modes. See HHH-980 and this comment from one of the lead Hibernate contributors about the javadoc confusion.
fetching lazily leads by default to the 1+N problem
It has nothing to do with lazy loading specifically, it is the default for eager loading as well if you don't fetch the eagerly loaded association in the same query.
it can be controlled via a class-level #BatchSize annotation
You have to place it on class-level for it to take effect on to-one associations with that entity; this answer is helpful. For collection associations (to-many associations with that entity defined in other entities) you have the flexibility to define it separately for each association.

To summarize my frustration... Hibernate is full of surprises (bugs?) in this respect:
unless the property is declared with #ManyToOne(fetch=FetchType.LAZY), you can't change anything
the default is FetchType.EAGER, which is stupid, as it can't be overridden
using criteria.setFetchMode(path, FetchMode.SELECT) is pointless as it's always a no-op (either it gets ignored because of the non-overridable eagerness of the property or the property is lazy already)!
fetching lazily leads by default to the 1+N problem
it can be controlled via a class-level #BatchSize annotation
placing a #BatchSize annotation on a scalar field gets silently ignored
In order to get what I wanted (two SQL queries), I need just two things:
declare the property with #ManyToOne(fetch=FetchType.LAZY)
place #BatchSize(size=aLot) on the class of the property
That's simple, but a bit hard to find out (because of all the ignored things above). I haven't looked into fetch profiles yet.

I wrote a small project to demonstrate the behavior. The SQL generated from your criteria is as follows:
select
this_.id as id1_0_4_,
this_.father_id as father_i3_0_4_,
this_.isDeleted as isDelete2_0_4_,
this_.mother_id as mother_i4_0_4_,
f1_.id as id1_1_0_,
f1_.isDeleted as isDelete2_1_0_,
f1_.owner_id as owner_id3_1_0_,
user5_.id as id1_3_1_,
user5_.isDeleted as isDelete2_3_1_,
m2_.id as id1_2_2_,
m2_.isDeleted as isDelete2_2_2_,
m2_.owner_id as owner_id3_2_2_,
user7_.id as id1_3_3_,
user7_.isDeleted as isDelete2_3_3_
from
Child this_
inner join
Father f1_
on this_.father_id=f1_.id
left outer join
User user5_
on f1_.owner_id=user5_.id
inner join
Mother m2_
on this_.mother_id=m2_.id
left outer join
User user7_
on m2_.owner_id=user7_.id
where
this_.id in (
?, ?
)
and this_.isDeleted=?
and f1_.isDeleted=?
and m2_.isDeleted=?
Changing the FetchMode in the criteria API did not affect the query. The owner data is still queried.
The Ids are in the "in" clause and Hibernate did not issue separate queries for each Id.
As mentioned in other answers, if the entity relation is set to EAGER, ie JPA default, then you can't change the fetch mode in the Criteria API. The fetch mode needs to be changed to LAZY.
You can see it here

Efficiently load large batch of OneToMany collections at once with Hibernate

I have a Parent entity with a #OneToMany relationship with a Child entity. Most of the time, when I need to work with a Parent’s Child entities, I’m working with a single parent, so lazy fetching (FetchMode.SELECT) is appropriate.
However, I have a situation where I’m querying a large number of Parents (sometimes hundreds or even thousands), and I need to work with their Child entities. FetchMode.SELECT gives me a serious N+1 problem, so I need to do something different in this scenario. If I were doing this via JDBC, it’d be single query for the Parent records, then another query for all the Child records using an IN statement (where child.parentid in (?,?,?....)). I need live Hibernate entities, because Hibernate Search is going to call getChildren() as part of its indexing process.
The options I’ve considered are:
Criteria.setFetchMode(“children”, FetchMode.JOIN) (or join fetch in HQL) - this would give me a cartesian product, though, which is brutal with that many entities.
Adding #BatchSize to Parent.getChildren() - this would help for my big batch scenario, but it isn’t really the strategy I want to use for normal operations. It’d be perfect if I could set a batch size for the fetch in my Criteria/HQL, but I can’t find a way to do so.
Using FetchMode.SUBSELECT in Parent.getChildren() - much like #BatchSize, this would be great for my big batch scenario, but isn’t appropriate for normal operations, and I can’t find a way to use it with Criteria/HQL (Criteria and the entity annotations use different FetchMode enums, despite the duplicate name).
tldr; I have a one-to-many relationship with a lazy fetch mode, but sometimes I want to be able to efficiently load the relationship for many entities at once.

HQL joined query to eager fetch a large number of relationships

My project has recently discovered that Hibernate can take multiple levels of relationship and eager fetch them in a single join HQL to produce the filled object we need. We love this feature, figuring it would outperform a lazy fetch circumstance.
Problem is, we hit a situation where a single parent has about a dozen direct relationships, an a few subrelationships off of that, and a few of them have several dozen rows in a few instances. The result is a pretty large cross-product that results in the hql spinning it's wheels virtually forever. We turned logging up to 11 and saw more than 100000 iterations before we gave up and killed it.
So clearly, while this technique is great for some situations, it has limits like everything in life. But what is the best performing alternative in hibernate for this? We don't want to lazy-load these, because we'll get into an N+1 situation that will be even worse.
I'd ideally like to have Hibernate pre-fetch all the rows and details, but do it one relationship at a time, and then hydrate the right detail object to the right parent, but I have no idea if it does such a thing.
Suggestions?
UPDATE:
So we got the SQL this query generated, it turns out that I misdiagnosed the problem. The cross product is NOT that huge. We ran the same query in our database directly and got 500 rows returned in just over a second.
Yet we saw very clearly in the hibernate logging it making 100K iterations. Is it possible Hibernate can get caught in a loop in your relationships or something?
Or maybe this should be asked as a new question?

Our team uses the special strategy to work with associations. Collections are lazy, single relations are lazy too, except references with simply structure (for an example a countries reference). And we use fluent-hibernate to load what we need in a concrete situation. It is simply because of fluent-hibernate supports nested projections. You can refer this unit test to see how complex object net can be partially loaded. A code snippet from the unit test
List<Root> roots = H.<Root> request(Root.class).proj(Root.ROOT_NAME)
.innerJoin("stationarFrom.stationar", "stationar")
.proj("stationar.name", "stationarFrom.stationar.name")
.eq(Root.ROOT_NAME, rootName).transform(Root.class).list();
See also
How to transform a flat result set using Hibernate

Hibernate - Limit eager loading collections to one table deep

I have for example, a DB with the following entity and relation structure:
[Person] has many [Skills], [Skills] has many [Actions]
In the .hbm.xml's, I assign one-to-many relations for person > skills, skills > actions.
In a query, I would like to be able to control when I query on Person, to eager load only the Skills. Currently, I seem to be stuck where i'm eager loading NOTHING, and generating a n+1 amount of queries to get a Person's Skills, or am generating an (n*n+1) amount of queries as it eagerly loads the entire Person > Skills > Actions collection hierarchy.
How would I limit it such that I can control when I do and do not want to load the third depth table? For context, I'd be able to live with always having Person > Skills collection initialized, ideally as a JOIN to prevent n+1 performance bottlenecks.

I use the following practice:
I try to avoid child collections on objects, especially if a child collection could have a lot of entries. If I need to get the children, I use a query to get them.
If I do have child collections, I always set collections to be lazy loaded.
For 'querying', using the Criteria API, I have a class that creates the query, executes it and returns the result. As part of building the query, I use root.fetch(Person_.skills); where root is javax.persistence.criteria.Root<Person> to eagerly load the collections I want.

It's a little bit off-topic, but you might consider to use some implementation of Graph Database to maintain data of such level of complexity instead of RDBMS and Hiberante. See neo4j, the graph database , which allows to create nodes (in you case persons, skills) and relations between them (extends, knows). So you will be able to easily traversal a data at any level of deep.

This turned out to be pretty easy to control at runtime.
In my .hbm.xml's i continued to declare my association sets as lazy, (even extra lazy!).
In the HQL query, I query like:
Select distinct p from Person
left join fetch p.skills
the fetch keyword forces eager loading for that particular join.

How to map a database query into an Object [in Java]?

I have a query that joins 5 tables.
Then I fill my hand-made object with the column values that I need.
What are solutions here that are wide-common to solve that problem using specific tools ? are there such tools?
I'm only beginning to learn Hibernate, so my question would be: is Hibernate the right decision for this problem?
Hibernate maps a table to a class. So, there's no difference if I would have 5 classes instead of 5 tables. It would still be difficult to join the query result into a class
Could hibernate be used to map THE QUERY into the structure (class) I would define beforehand as we do with table mapping? Or even better, can it map the query result into the meaningful fields [auto-create the class with fields] as it does with reverse-engineering?
I've been thinking about views but.. create a new view everytime we need a complex query.. too verbose.
As S.Lott asked, here is a simple version of a question:
General problem:
select A.field_a, B.field_b, C.field_c
from table_a A inner join table_b B inner join table_c C
where ...
every table contains 100 fields
query returns 3 fields, but every field belongs to the unique table
How do I solve that problem in an OO style?
Design a new object with properties corresponding to the returning values of the query.
I want to know if it is the right [and the only one possible] decision and are there any common solutions.
See also my comments.

The point of ORM is to Map Objects to Relations.
The point of ORM is -- explicitly -- not to sweat the details of a specific SQL join.
One fundamental guideline to understanding ORM is this.
SQL joins are a hack because SQL doesn't have proper navigation.
To do ORM design, we to intentionally set the SQL join considerations aside as (largely) irrelevant. Give up the old ways. It's okay, really. The SQL crutches aren't supporting us very well.
Step 1. Define the domain of discourse. The real-world objects.
Step 2. Define implementation classes that are a high-fidelity model of real-world things.
Step 3. Map the objects to relations. Here's where the hack-arounds start. SQL doesn't have a variety of collections -- it only has tables. SQL doesn't have subclasses, it only has tables. So you have to design a "good-enough" mapping between object classes and tables. Ideally, this is one-to-one. But in reality, it doesn't work out that way. Often you will have some denormalization to handle class hierarchies. Other than that, it should work out reasonably well.
Yes you have to add many-to-many association tables that have no object mapping.
Step 4. You're done. Write your application.
"But what about my query that joins 5 (or 3) tables but only takes one attribute from each table?"
What about it? One of those tables is the real object you're dealing with. The other of those 5 (or 3) tables are either part of 1-m nested collections, m-1 containers or m-m associations. That's just navigation among objects.
A 1-m nested collection is the kind of thing that SQL treats as a "result set". In ORM it will become a proper object collection.
A m-1 contain is the typical FK relationship. In ORM it's just a fetch of a related object through ordinary object navigation.
A m-m association is also an object collection. It's a strange collection because two objects are members of each other's collections, but it's just an object collection.
At no time do you design an object that matches a query. You design an object that matches the real world, map that to the database.
"What about performance?" It's only a problem when you subvert the ORM's simple mapping rules. Once in a blue moon you have to create a special-purpose view to handle really big batch-oriented joins among objects. But this is really rare. Often, rethinking your Java program's navigation patterns will improve performance.
Remember, ORM's cache your results. Each "navigation" may not be a complete "round-trip" to the database query. Some queries may be batched by the ORM for you.

There are a few options:
Create a single table mapping using <join> elements for the related tables. A join in that way will allow other tables to contribute properties to your class.
Use a database view as previously suggested.
Use a Hibernate mapping view - instead of <class name=... table=... you can use <class name=... select="select A.field_a, B.field_b, ... from A, B, ...">. It's essentially creating a view on the Hibernate side so the database doesn't have to change. The generated sql will end up looking like "select * from (select A.field_a, B.field_b from A, B, ...)
". I know that works in Oracle, DB2, and MySQL
All that is fine for selecting; if you need to do insert/update, you'll probably need to rethink your data model or your object model.

I think you could use the Criteria API in Hibernate to map the results of your join into your target class.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.