Query with JOIN FETCH performance problem - java

I have problem with hibernate query performance which I can't figure out. In code snippet below I need select entities with at least one mapping and filtered mapping. I'm using FETCH JOIN for this to load only filtered mappings.
But in that case I have performance problems with query. Hibernate says warning log :
org.hibernate.hql.ast.QueryTranslatorImpl
- firstResult/maxResults specified with collection fetch; applying in
memory!
When I omit FETCH JOIN and left only JOIN query is nice fast. But in result I have all mappings loaded to entity which is not acceptable state for me. Is there a way to boost query performance? There are a lot rows in mapping table.
HQL query :
select distinct e from Entity
join fetch e.mappings as mapping
where e.deleted = 0 and e.mappings is not empty
and e = mapping.e and mapping.approval in (:approvals)
Entities :
#Entity
#Table(name="entity")
class Entity {
...
#OneToMany(mappedBy="entity", cascade=CascadeType.REMOVE, fetch=FetchType.LAZY)
#OrderBy("created")
private List<Mapping> mappings = new ArrayList<Mapping>();
...
}
#Entity
#Table(name="mapping")
class Mapping {
public static enum MappingApproval {
WAITING, // mapping is waiting for approval
APPROVED, // mapping was approved
DECLINED; // mapping was declined
}
...
#ManyToOne(fetch=FetchType.EAGER)
#JoinColumn(name="entity_id", nullable=false)
private Entity entity;
#Enumerated(EnumType.STRING)
#Column(name="approval", length=20)
private MappingApproval approval;
...
}
Thanks

From the JPA-Specifications
The effect of applying setMaxResults or setFirstResult to a query
involving fetch joins over collections is undefined. (JPA "Enterprise
JavaBeans 3.0, Final Release", Kapitel 3.6.1 Query Interface)
Hibernate does the right thing, but executes a part of the query in memory, which is tremendously slower. In my case the difference is between 3-5 ms to 400-500 ms.
My solution was to implement the paging within the query itself. Works fast with the JOIN FETCH.

If you need a firstResult/maxResults with "fetch" you can split your query in 2 queries:
Query your entity ids with firstResult/maxResults but without the "fetch" on sub-tables:
select entity.id from entity (without fetch) where .... (with firstResult/maxResults)
Query your entities with the "fetch" on the ids returned by your first query:
select entity from entity fetch ... where id in <previous ids>

The reason is slow is because Hibernate executes the SQL query with no pagination at all and the restriction is done in memory.
However, if the join has to scan and fetch 100k records, while you are interested in just 100 results, then 99.9% of the work being done by the Extractor and all the I/O done over networking is just waste.
You can easily turn a JPQL query that uses both JOIN FETCH and pagination:
List<Post> posts = entityManager.createQuery("""
select p
from Post p
left join fetch p.comments
where p.title like :title
order by p.id
""", Post.class)
.setParameter("title", titlePattern)
.setMaxResults(maxResults)
.getResultList();
into an SQL query that limits the result using DENSE_RANK by the parent identifier:
#NamedNativeQuery(
name = "PostWithCommentByRank",
query =
"SELECT * " +
"FROM ( " +
" SELECT *, dense_rank() OVER (ORDER BY \"p.created_on\", \"p.id\") rank " +
" FROM ( " +
" SELECT p.id AS \"p.id\", " +
" p.created_on AS \"p.created_on\", " +
" p.title AS \"p.title\", " +
" pc.id as \"pc.id\", " +
" pc.created_on AS \"pc.created_on\", " +
" pc.review AS \"pc.review\", " +
" pc.post_id AS \"pc.post_id\" " +
" FROM post p " +
" LEFT JOIN post_comment pc ON p.id = pc.post_id " +
" WHERE p.title LIKE :titlePattern " +
" ORDER BY p.created_on " +
" ) p_pc " +
") p_pc_r " +
"WHERE p_pc_r.rank <= :rank ",
resultSetMapping = "PostWithCommentByRankMapping"
)
#SqlResultSetMapping(
name = "PostWithCommentByRankMapping",
entities = {
#EntityResult(
entityClass = Post.class,
fields = {
#FieldResult(name = "id", column = "p.id"),
#FieldResult(name = "createdOn", column = "p.created_on"),
#FieldResult(name = "title", column = "p.title"),
}
),
#EntityResult(
entityClass = PostComment.class,
fields = {
#FieldResult(name = "id", column = "pc.id"),
#FieldResult(name = "createdOn", column = "pc.created_on"),
#FieldResult(name = "review", column = "pc.review"),
#FieldResult(name = "post", column = "pc.post_id"),
}
)
}
)
The query can be executed like this:
List<Post> posts = entityManager
.createNamedQuery("PostWithCommentByRank")
.setParameter(
"titlePattern",
"High-Performance Java Persistence %"
)
.setParameter(
"rank",
5
)
.unwrap(NativeQuery.class)
.setResultTransformer(
new DistinctPostResultTransformer(entityManager)
)
.getResultList();
To transform the tabular result set back into an entity graph, you need a ResultTransformer which looks as follows:
public class DistinctPostResultTransformer
extends BasicTransformerAdapter {
private final EntityManager entityManager;
public DistinctPostResultTransformer(
EntityManager entityManager) {
this.entityManager = entityManager;
}
#Override
public List transformList(
List list) {
Map<Serializable, Identifiable> identifiableMap =
new LinkedHashMap<>(list.size());
for (Object entityArray : list) {
if (Object[].class.isAssignableFrom(entityArray.getClass())) {
Post post = null;
PostComment comment = null;
Object[] tuples = (Object[]) entityArray;
for (Object tuple : tuples) {
if(tuple instanceof Identifiable) {
entityManager.detach(tuple);
if (tuple instanceof Post) {
post = (Post) tuple;
}
else if (tuple instanceof PostComment) {
comment = (PostComment) tuple;
}
else {
throw new UnsupportedOperationException(
"Tuple " + tuple.getClass() + " is not supported!"
);
}
}
}
if (post != null) {
if (!identifiableMap.containsKey(post.getId())) {
identifiableMap.put(post.getId(), post);
post.setComments(new ArrayList<>());
}
if (comment != null) {
post.addComment(comment);
}
}
}
}
return new ArrayList<>(identifiableMap.values());
}
}
That's it!

after increasing memory for JVM things goes much better. After all I end with not using FETCH in queries.

Related

Avoiding "HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!" using Spring Data [duplicate]

I'm getting a warning in the Server log "firstResult/maxResults specified with collection fetch; applying in memory!". However everything working fine. But I don't want this warning.
My code is
public employee find(int id) {
return (employee) getEntityManager().createQuery(QUERY).setParameter("id", id).getSingleResult();
}
My query is
QUERY = "from employee as emp left join fetch emp.salary left join fetch emp.department where emp.id = :id"
Although you are getting valid results, the SQL query fetches all data and it's not as efficient as it should.
So, you have two options.
Fixing the issue with two SQL queries that can fetch entities in read-write mode
The easiest way to fix this issue is to execute two queries:
. The first query will fetch the root entity identifiers matching the provided filtering criteria.
. The second query will use the previously extracted root entity identifiers to fetch the parent and the child entities.
This approach is very easy to implement and looks as follows:
List<Long> postIds = entityManager
.createQuery(
"select p.id " +
"from Post p " +
"where p.title like :titlePattern " +
"order by p.createdOn", Long.class)
.setParameter(
"titlePattern",
"High-Performance Java Persistence %"
)
.setMaxResults(5)
.getResultList();
List<Post> posts = entityManager
.createQuery(
"select distinct p " +
"from Post p " +
"left join fetch p.comments " +
"where p.id in (:postIds) " +
"order by p.createdOn", Post.class)
.setParameter("postIds", postIds)
.setHint(
"hibernate.query.passDistinctThrough",
false
)
.getResultList();
Fixing the issue with one SQL query that can only fetch entities in read-only mode
The second approach is to use SDENSE_RANK over the result set of parent and child entities that match our filtering criteria and restrict the output for the first N post entries only.
The SQL query can look as follows:
#NamedNativeQuery(
name = "PostWithCommentByRank",
query =
"SELECT * " +
"FROM ( " +
" SELECT *, dense_rank() OVER (ORDER BY \"p.created_on\", \"p.id\") rank " +
" FROM ( " +
" SELECT p.id AS \"p.id\", " +
" p.created_on AS \"p.created_on\", " +
" p.title AS \"p.title\", " +
" pc.id as \"pc.id\", " +
" pc.created_on AS \"pc.created_on\", " +
" pc.review AS \"pc.review\", " +
" pc.post_id AS \"pc.post_id\" " +
" FROM post p " +
" LEFT JOIN post_comment pc ON p.id = pc.post_id " +
" WHERE p.title LIKE :titlePattern " +
" ORDER BY p.created_on " +
" ) p_pc " +
") p_pc_r " +
"WHERE p_pc_r.rank <= :rank ",
resultSetMapping = "PostWithCommentByRankMapping"
)
#SqlResultSetMapping(
name = "PostWithCommentByRankMapping",
entities = {
#EntityResult(
entityClass = Post.class,
fields = {
#FieldResult(name = "id", column = "p.id"),
#FieldResult(name = "createdOn", column = "p.created_on"),
#FieldResult(name = "title", column = "p.title"),
}
),
#EntityResult(
entityClass = PostComment.class,
fields = {
#FieldResult(name = "id", column = "pc.id"),
#FieldResult(name = "createdOn", column = "pc.created_on"),
#FieldResult(name = "review", column = "pc.review"),
#FieldResult(name = "post", column = "pc.post_id"),
}
)
}
)
The #NamedNativeQuery fetches all Post entities matching the provided title along with their associated PostComment child entities. The DENSE_RANK Window Function is used to assign the rank for each Post and PostComment joined record so that we can later filter just the amount of Post records we are interested in fetching.
The SqlResultSetMapping provides the mapping between the SQL-level column aliases and the JPA entity properties that need to be populated.
Now, we can execute the PostWithCommentByRank #NamedNativeQuery like this:
List<Post> posts = entityManager
.createNamedQuery("PostWithCommentByRank")
.setParameter(
"titlePattern",
"High-Performance Java Persistence %"
)
.setParameter(
"rank",
5
)
.unwrap(NativeQuery.class)
.setResultTransformer(
new DistinctPostResultTransformer(entityManager)
)
.getResultList();
Now, by default, a native SQL query like the PostWithCommentByRank one would fetch the Post and the PostComment in the same JDBC row, so we will end up with an Object[] containing both entities.
However, we want to transform the tabular Object[] array into a tree of parent-child entities, and for this reason, we need to use the Hibernate ResultTransformer.
The DistinctPostResultTransformer looks as follows:
public class DistinctPostResultTransformer
extends BasicTransformerAdapter {
private final EntityManager entityManager;
public DistinctPostResultTransformer(
EntityManager entityManager) {
this.entityManager = entityManager;
}
#Override
public List transformList(
List list) {
Map<Serializable, Identifiable> identifiableMap =
new LinkedHashMap<>(list.size());
for (Object entityArray : list) {
if (Object[].class.isAssignableFrom(entityArray.getClass())) {
Post post = null;
PostComment comment = null;
Object[] tuples = (Object[]) entityArray;
for (Object tuple : tuples) {
if(tuple instanceof Identifiable) {
entityManager.detach(tuple);
if (tuple instanceof Post) {
post = (Post) tuple;
}
else if (tuple instanceof PostComment) {
comment = (PostComment) tuple;
}
else {
throw new UnsupportedOperationException(
"Tuple " + tuple.getClass() + " is not supported!"
);
}
}
}
if (post != null) {
if (!identifiableMap.containsKey(post.getId())) {
identifiableMap.put(post.getId(), post);
post.setComments(new ArrayList<>());
}
if (comment != null) {
post.addComment(comment);
}
}
}
}
return new ArrayList<>(identifiableMap.values());
}
}
The DistinctPostResultTransformer must detach the entities being fetched because we are overwriting the child collection and we don’t want that to be propagated as an entity state transition:
post.setComments(new ArrayList<>());
Reason for this warning is that when fetch join is used, order in result sets is defined only by ID of selected entity (and not by join fetched).
If this sorting in memory is causing problems, do not use firsResult/maxResults with JOIN FETCH.
To avoid this WARNING you have to change the call getSingleResult to
getResultList().get(0)
This warning tells you Hibernate is performing in memory java pagination. This can cause high JVM memory consumption.
Since a developer can miss this warning, I contributed to Hibernate by adding a flag allowing to throw an exception instead of logging the warning (https://hibernate.atlassian.net/browse/HHH-9965).
The flag is hibernate.query.fail_on_pagination_over_collection_fetch.
I recommend everyone to enable it.
The flag is defined in org.hibernate.cfg.AvailableSettings :
/**
* Raises an exception when in-memory pagination over collection fetch is about to be performed.
* Disabled by default. Set to true to enable.
*
* #since 5.2.13
*/
String FAIL_ON_PAGINATION_OVER_COLLECTION_FETCH = "hibernate.query.fail_on_pagination_over_collection_fetch";
the problem is you will get cartesian product doing JOIN. The offset will cut your recordset without looking if you are still on same root identity class
I guess the emp has many departments which is a One to Many relationship. Hibernate will fetch many rows for this query with fetched department records. So the order of result set can not be decided until it has really fetch the results to the memory. So the pagination will be done in memory.
If you do not want to fetch the departments with emp, but still want to do some query based on the department, you can achieve the result with out warning (without doing ordering in the memory). For that simply you have to remove the "fetch" clause. So something like as follows:
QUERY = "from employee as emp left join emp.salary sal left join emp.department dep where emp.id = :id and dep.name = 'testing' and sal.salary > 5000 "
As others pointed out, you should generally avoid using "JOIN FETCH" and firstResult/maxResults together.
If your query requires it, you can use .stream() to eliminate warning and avoid potential OOM exception.
try (Stream<ENTITY> stream = em.createQuery(QUERY).stream()) {
ENTITY first = stream.findFirst().orElse(null); // equivalents .getSingleResult()
}
// Stream returned is an IO stream that needs to be closed manually.

Spring Data paginate and sort a NamedNativeQuery (JPA-Hibernate-MySql)

I'm using NamedNativeQueries with SqlResultSetMappings in a Spring Data (JPA Hibernate MySQL) application, and I've been successful with the Pagination, but not with the sorting.
I've tried two forms of queries:
#NamedNativeQuery(
name = "DatasetDetails.unallocatedDetailsInDataset",
resultClass = DatasetDetails.class,
resultSetMapping = "DatasetDetails.detailsForAllocation",
query = "SELECT dd.id, fk_datasets_id, fk_domains_id, fk_sources_id, dom.name AS domain, " +
"src.name AS source " +
"FROM datasets AS d " +
"JOIN datasets_details AS dd ON dd.fk_datasets_id = d.id " +
"JOIN sources AS src ON src.id = dd.fk_sources_id " +
"JOIN domains AS dom ON dom.id = dd.fk_domains_id " +
"WHERE fk_datasets_id = :datasetId " +
"AND dd.id NOT IN (" +
"SELECT fk_datasets_details_id from allocations_datasets_details) \n/* #page */\n"),
and the second is simply using the count notation on a second query instead of using the #page notation.
#NamedNativeQuery(
name = "DatasetDetails.unallocatedDetailsInDataset.count",
resultClass = DatasetDetails.class,
resultSetMapping = "DatasetDetails.detailsForAllocation",
query = "SELECT count(*)
....
Both methods work for pagination, but the sorting is ignored.
Here is the repository:
public interface DatasetDetailsRepository extends PagingAndSortingRepository<DatasetDetails, Long> {
#Query(nativeQuery = true)
List<DatasetDetails> unallocatedDetailsInDataset(#Param("datasetId") long datasetId,
#Param("page") Pageable page);
}
And the pageable gets assembled like this:
Sort sort = Sort.by(Sort.Order.asc(DatasetDetails.DOMAIN), Sort.Order.asc(DatasetDetails.SOURCE));
Pageable page = PageRequest.of(page, limit, sort);
No errors are thrown, but the sorting simply doesn't get done and no ORDER BY is generated.
Explicitly adding something like ORDER BY #{#page} won't compile.
I encountered the same problem, where I had to dynamically filter/sort using a NamedNativeQuery by different columns and directions; apparently the Sorting was ignored. I found this workaround, which is not necessarily nice but it does the job:
For the repository:
List<MyEntity> findMyEntities(
#Param("entityId") long entityId,
#Param("sortColumn") String sortColumn,
#Param("sortDirection") String sortDirection,
Pageable page);
The native queries look like this:
#NamedNativeQueries({
#NamedNativeQuery(name = "MyEntity.findMyEntities",
query = "select e.field1, e.field2, ..." +
" from my_schema.my_entities e" +
" where condition1 and condtition2 ..." +
" order by " +
" CASE WHEN :sortColumn = 'name' and :sortDirection = 'asc' THEN e.name END ASC," +
" CASE WHEN :sortColumn = 'birthdate' and :sortDirection = 'asc' THEN e.birthdate END ASC," +
" CASE WHEN :sortColumn = 'name' and :sortDirection = 'desc' THEN e.name END DESC," +
" CASE WHEN :sortColumn = 'birthdate' and :sortDirection = 'desc' THEN e.birthdate END DESC" +
),
#NamedNativeQuery(name = "MyEntity.findMyEntities.count",
query = "select count(*) from my_schema.my_entities e" +
" where condition1 and condtition2 ..." +
" and :sortColumn = :sortColumn and :sortDirection = :sortDirection"
)
})
Notice in the count query I use the 2 redundant conditions for :sortColumn and :sortDirection, because once specified as #Param in the repository function, you need to use them in the actual query.
When calling the function, in my service I had a boolean which dictates the direction and a string that dictates the sorting column like this:
public Page<MyEntity> serviceFindFunction(Long entityId, String sortColumn, Boolean sortDirection, Integer pageNumber, Integer pageSize) {
String sortDir = (sortDirection) ? 'asc' : 'desc';
Pageable pageable = new PageRequest(pageNumber, pageSize); // Spring Data 1.0 syntax
// for Spring Data 2.0, as you were using, simply:
// Pageable pageable = PageRequest.of(pageNumber, pageSize);
return entityRepository.findMyEntities(entityId, sortColumn, sortDir, pageable)
}
The 2 things that I don't like about this are the redundant usage of the sortColumn and sortDirection params in the count query, and the way I wrote the order by statement. The reasoning for having separate CASE statements is because I had different data types for the columns that I sorted by, and if they are incompatible (e.g. nvarchar and date), the query will fail with the error:
Conversion failed when converting date and/or time from character string
I could also probably nest the conditionals, i.e. first making a case for the direction, the making an inner case for the columns, but my SQL skills only went this far.
Hope this helps! Any feedback or improvements are very welcomed.

Why hibernate generates ~500 SQL queries?

When trying to optimize MySQ slow queries generated by Hibernate 4.2 in a legacy project, I found out that the code below generates nearly 500 SQL queries (with many duplicates) :
class MyDAO {
public List<Message> findMessages() {
Session session = MyHibernateUtils.openSession();
String queryStr = "SELECT DISTINCT m FROM Message m "
+ " LEFT JOIN fetch m.types types "
+ " LEFT JOIN fetch m.mainType mainType "
+ " LEFT JOIN fetch m.place place "
+ " LEFT JOIN fetch m.building building "
+ " LEFT JOIN fetch m.city city "
+ " LEFT JOIN fetch m.kind kind "
+ " LEFT JOIN fetch m.domain domain "
+ " LEFT JOIN fetch m.action action "
+ " LEFT JOIN fetch m.customParameterA customParameterA "
+ " LEFT JOIN fetch m.customParameterB customParameterB "
+ " LEFT JOIN fetch m.scheduleEvents scheduleEvents "
+ " LEFT JOIN fetch m.comments comments "
+ " LEFT JOIN fetch m.messageLastActivities messageLastActivities "
+ " LEFT JOIN fetch m.customListA customListA "
+ " LEFT JOIN fetch m.childEvents childEvents "
+ " LEFT JOIN fetch m.parentEvent parentEvent "
+ " WHERE ...";
List<Message> messages;
try {
session.getTransaction().begin();
Query query = session.createQuery(queryStr);
query.setTimeout(10);
messages = query.list();
session.getTransaction().commit();
} catch (RuntimeException e) {
session.getTransaction().rollback();
throw e;
} finally {
session.close();
}
return messages;
}
}
How can I avoid having so many SQL queries ?
I don't know if it helps but there are many onyToMany and manyToMany relationships between the entities.
Thank for your help.
You should check the queries hibernate is generating, to see which table is accessed frequently.
You have to join fetch entities related by your related entities as well, See here:
Hibernate is doing multiple select requests instead one (using join fetch)
I personally prefer lazy loading with an annotated #BatchSize() to keep the lazy-query-count small. Just using a Batch-Size of 2 will cut your query count in half then.
Also have a look at the #Cache Annotation which can reduce your query count in a significant way. (Just thinking of all the almost static stuff like city/building/type/domain and the like)
Depending on your relationship design, default value of Fetch in #OneToMany and #ManyToMany is LAZY, that means for loading related record in child entity (when you call getter method) hibernate executes one more query to load that record (for example: select * from foo where id = ?) , so if loaded entity (main entity) contains many child entities such as ManyToMany or OneToMany you will see many queries in console.
To void these queries, you can set Fetch to EAGER but this is not recommended on optimization.
#Entity
public class MainEntity {
#ManyToMany(Fetch = FetchType.EAGER)
public List<Foo> foos;
}

How to map Results when Querying raw SQL using Ebean

Using Postgres Tables created by Ebean, I would like to query these tables with a hand-written statement:
SELECT r.name,
r.value,
p.name as param1,
a.name as att1,
p2.name as param2,
a2.name as att2
FROM compatibility c
JOIN attribute a ON c.att1_id = a.id
JOIN attribute a2 ON c.att2_id = a2.id
JOIN PARAMETER p ON a.parameter_id = p.id
JOIN PARAMETER p2 ON a2.parameter_id = p2.id
JOIN rating r ON c.rating_id = r.id
WHERE p.problem_id = %d
OR p2.problem_id = %d
Each of the joined tables represent one of my model classes.
The query executes fine, but I don't know how I would proceed:
How do I even execute the query using Play 2.2. and Ebean?
How can I map this query to an iterable object? Do I need to create a Model class which contains all the fields from the query, or can I use some sort of HashMap?
How can I parameterize the query in a safe way?
To execute this query you need to use RawSql class. You will also have to create class to which results will be casted.
Here is a code of exemplary result class:
import javax.persistence.Entity;
import com.avaje.ebean.annotation.Sql;
#Entity
#Sql
public class Result {
String name;
Integer value;
String param1;
String param2;
String att1;
String att2;
}
And example of executing this query:
String sql
= " SELECT r.name,"
+ " r.value,"
+ " p.name as param1,"
+ " a.name as att1,"
+ " p2.name as param2,"
+ " a2.name as att2"
+ " FROM compatibility c"
+ " JOIN attribute a ON c.att1_id = a.id"
+ " JOIN attribute a2 ON c.att2_id = a2.id"
+ " JOIN PARAMETER p ON a.parameter_id = p.id"
+ " JOIN PARAMETER p2 ON a2.parameter_id = p2.id"
+ " JOIN rating r ON c.rating_id = r.id"
+ " WHERE p.problem_id = %d"
+ " OR p2.problem_id = %d"
RawSql rawSql =
RawSqlBuilder
.parse(sql)
.columnMapping("r.name", "name")
.columnMapping("r.value", "value")
.create();
Query<Result> query = Ebean.find(Result.class);
query.setRawSql(rawSql)
.where().gt("amount", 10);
List<Result> list = query.findList();

How to map query to non-entity class + entity class

I know how to do query to resultClass mapping in IBatis.
How can I map the native query result to the object that is a mix of entity class and scalars in hibernate ? How can I set the parameters ?
Please Help.
With Hibernate Session API, you can do it by comining addEntity() and addScalar() methods:
Query q = s.createSQLQuery(
"select p.*, count(e.id) as c " +
"from Project p left join Employee e on p.id = e.project_id " +
"group by p.id")
.addEntity(Project.class).addScalar("c");
In JPA you can do it with #SqlResultSetMapping:
#SqlResultSetMappings(
#SqlResultSetMapping(name = "projectWithCount"
entities = #EntityResult(entityClass = Project.class),
columns = #ColumnResult(name = "c")))
...
Query q = s.createSQLQuery(
"...", "projectWithCount")

Categories