This is a simple Product entity that refers to a subgroup:
public class Product implements Comparable<Product> {
...
#ManyToOne(optional=false, fetch=FetchType.LAZY)
#NotNull
private ProductSubGroup productSubGroup;
...
}
I have a map containing Product's in another entity:
public class FinishedProduct {
...
#NotNull
#ManyToOne
private Product product;
#ElementCollection(fetch=FetchType.LAZY)
#MapKeyJoinColumn
#Column(name="amount")
#Sort(type=SortType.NATURAL)
#Fetch(FetchMode.SUBSELECT)
private SortedMap<Product, Double> byproducts = new TreeMap<>();
...
}
I can load the map with this code:
Root<FinishedProduct> root = q.from(FinishedProduct.class);
root.fetch("product", JoinType.LEFT);
root.fetch("byproducts", JoinType.LEFT);
This works, but I need the productSubGroup of the byproducts stored in the map without generating n+1 selects. How can I fetch them? Just adding the fetch to the end results in an exception:
root.fetch("byproducts", JoinType.LEFT).fetch("productSubGroup", JoinType.LEFT);
org.springframework.dao.InvalidDataAccessApiUsageException:
Collection of values [null] cannot be source of a fetch
Also tried to fool around with MapJoin, same exception:
MapJoin<FinishedProduct,Product,Double> map = root.joinMap("byproducts", JoinType.LEFT);
map.fetch("productSubGroup", JoinType.LEFT);
I guess I somehow need to refer to the map key, but no idea how.
These are a bit complex mappings that you have here and I am not sure if there is an easier way to accomplish this. Hopefully somebody will provide a better answer, but as an alternative there is always the ability to pre-load into the persistence context all the entity instances that you know will be fetched with n+1 selects.
So, before firing your query, just load all ProductSubGroups which are expected to be fetched:
select p.productSubGroup from Product p
where p in (select index(byproducts) from FinishedProduct)
Of course, repeat any other additional restrictions on FinishedProduct in the subquery which you have in your original query to avoid loading ProductSubGroups which you don't need.
As even better alternative (in my opinion), you may want to consider defining #BatchSize for Product.productSubGroup association. That way ProductSubGroups would be loaded in batches instead of one by one.
Related
Given the following domain model, I want to load all Answers including their Values and their respective sub-children and put it in an AnswerDTO to then convert to JSON. I have a working solution but it suffers from the N+1 problem that I want to get rid of by using an ad-hoc #EntityGraph. All associations are configured LAZY.
#Query("SELECT a FROM Answer a")
#EntityGraph(attributePaths = {"value"})
public List<Answer> findAll();
Using an ad-hoc #EntityGraph on the Repository method I can ensure that the values are pre-fetched to prevent N+1 on the Answer->Value association. While my result is fine there is another N+1 problem, because of lazy loading the selected association of the MCValues.
Using this
#EntityGraph(attributePaths = {"value.selected"})
fails, because the selected field is of course only part of some of the Value entities:
Unable to locate Attribute with the the given name [selected] on this ManagedType [x.model.Value];
How can I tell JPA only try fetching the selected association in case the value is a MCValue? I need something like optionalAttributePaths.
You can only use an EntityGraph if the association attribute is part of the superclass and by that also part of all subclasses. Otherwise, the EntityGraph will always fail with the Exception that you currently get.
The best way to avoid your N+1 select issue is to split your query into 2 queries:
The 1st query fetches the MCValue entities using an EntityGraph to fetch the association mapped by the selected attribute. After that query, these entities are then stored in Hibernate's 1st level cache / the persistence context. Hibernate will use them when it processes the result of the 2nd query.
#Query("SELECT m FROM MCValue m") // add WHERE clause as needed ...
#EntityGraph(attributePaths = {"selected"})
public List<MCValue> findAll();
The 2nd query then fetches the Answer entity and uses an EntityGraph to also fetch the associated Value entities. For each Value entity, Hibernate will instantiate the specific subclass and check if the 1st level cache already contains an object for that class and primary key combination. If that's the case, Hibernate uses the object from the 1st level cache instead of the data returned by the query.
#Query("SELECT a FROM Answer a")
#EntityGraph(attributePaths = {"value"})
public List<Answer> findAll();
Because we already fetched all MCValue entities with the associated selected entities, we now get Answer entities with an initialized value association. And if the association contains an MCValue entity, its selected association will also be initialized.
I don't know what Spring-Data is doing there, but to do that, you usually have to use the TREAT operator to be able to access the sub-association but the implementation for that Operator is quite buggy.
Hibernate supports implicit subtype property access which is what you would need here, but apparently Spring-Data can't handle this properly. I can recommend that you take a look at Blaze-Persistence Entity-Views, a library that works on top of JPA which allows you map arbitrary structures against your entity model. You can map your DTO model in a type safe way, also the inheritance structure. Entity views for your use case could look like this
#EntityView(Answer.class)
interface AnswerDTO {
#IdMapping
Long getId();
ValueDTO getValue();
}
#EntityView(Value.class)
#EntityViewInheritance
interface ValueDTO {
#IdMapping
Long getId();
}
#EntityView(TextValue.class)
interface TextValueDTO extends ValueDTO {
String getText();
}
#EntityView(RatingValue.class)
interface RatingValueDTO extends ValueDTO {
int getRating();
}
#EntityView(MCValue.class)
interface TextValueDTO extends ValueDTO {
#Mapping("selected.id")
Set<Long> getOption();
}
With the spring data integration provided by Blaze-Persistence you can define a repository like this and directly use the result
#Transactional(readOnly = true)
interface AnswerRepository extends Repository<Answer, Long> {
List<AnswerDTO> findAll();
}
It will generate a HQL query that selects just what you mapped in the AnswerDTO which is something like the following.
SELECT
a.id,
v.id,
TYPE(v),
CASE WHEN TYPE(v) = TextValue THEN v.text END,
CASE WHEN TYPE(v) = RatingValue THEN v.rating END,
CASE WHEN TYPE(v) = MCValue THEN s.id END
FROM Answer a
LEFT JOIN a.value v
LEFT JOIN v.selected s
My latest project used GraphQL (a first for me) and we had a big issue with N+1 queries and trying to optimize the queries to only join for tables when they are required. I have found Cosium
/
spring-data-jpa-entity-graph irreplaceable. It extends JpaRepository and adds methods to pass in an entity graph to the query. You can then build dynamic entity graphs at runtime to add in left joins for only the data you need.
Our data flow looks something like this:
Receive GraphQL request
Parse GraphQL request and convert to list of entity graph nodes in the query
Create entity graph from the discovered nodes and pass into the repository for execution
To solve the problem of not including invalid nodes into the entity graph (for example __typename from graphql), I created a utility class which handles the entity graph generation. The calling class passes in the class name it is generating the graph for, which then validates each node in the graph against the metamodel maintained by the ORM. If the node is not in the model, it removes it from the list of graph nodes. (This check needs to be recursive and check each child as well)
Before finding this I had tried projections and every other alternative recommended in the Spring JPA / Hibernate docs, but nothing seemed to solve the problem elegantly or at least with a ton of extra code
Edited after your comment:
My apologize, I haven't undersood you issue in the first round, your issue occurs on startup of spring-data, not only when you try to call the findAll().
So, you can now navigate the full example can be pull from my github:
https://github.com/bdzzaid/stackoverflow-java/blob/master/jpa-hibernate/
You can easlily reproduce and fix your issue inside this project.
Effectivly, Spring data and hibernate are not capable to determinate the "selected" graph by default and you need to specify the way to collect the selected option.
So first, you have to declare the NamedEntityGraphs of the class Answer
As you can see, there is two NamedEntityGraph for the attribute value of the class Answer
The first for all Value without specific relationship to load
The second for the specific Multichoice value. If you remove this one, you reproduce the exception.
Second, you need to be in a transactional context answerRepository.findAll() if you want to fetch data in type LAZY
#Entity
#Table(name = "answer")
#NamedEntityGraphs({
#NamedEntityGraph(
name = "graph.Answer",
attributeNodes = #NamedAttributeNode(value = "value")
),
#NamedEntityGraph(
name = "graph.AnswerMultichoice",
attributeNodes = #NamedAttributeNode(value = "value"),
subgraphs = {
#NamedSubgraph(
name = "graph.AnswerMultichoice.selected",
attributeNodes = {
#NamedAttributeNode("selected")
}
)
}
)
}
)
public class Answer
{
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(updatable = false, nullable = false)
private int id;
#OneToOne(cascade = CascadeType.ALL)
#JoinColumn(name = "value_id", referencedColumnName = "id")
private Value value;
// ..
}
What is the good way to force initialization of Lazy Loaded field in each object of collection?
At this moment the only thing that comes to my mind is to use for each loop to iterate trough collection and call getter of that field but it's not very effiecient. Collection can have even 1k objects and in that case every iteration will fire to db.
I can't change the way I fetch objects from DB.
Example of code.
public class TransactionData{
#ManyToOne(fetch = FetchType.LAZY)
private CustomerData customer;
...
}
List<TransactionData> transactions = getTransactions();
You may define Entity Graphs to overrule the default fetch types, as they are defined in the Mapping.
See the example below
#Entity
#NamedEntityGraph(
name = "Person.addresses",
attributeNodes = #NamedAttributeNode("addresses")
)
public class Person {
...
#OneToMany(fetch = FetchType.LAZY) // default fetch type
private List<Address> addresses;
...
}
In the following query the adresses will now be loaded eagerly.
EntityGraph entityGraph = entityManager.getEntityGraph("Person.addresses");
TypedQuery<Person> query = entityManager.createNamedQuery("Person.findAll", Person.class);
query.setHint("javax.persistence.loadgraph", entityGraph);
List<Person> persons = query.getResultList();
In that way you are able to define specific fetch behaviour for each differet use-case.
See also:
http://www.thoughts-on-java.org/jpa-21-entity-graph-part-1-named-entity/
https://docs.oracle.com/javaee/7/tutorial/persistence-entitygraphs001.htm
By the way: afaik do most JPA provider perform eager loading of #XXXtoOne relations, even if you define the mapping as lazy. The JPA spec does allow this behaviour, as lazy loading is always just a hint that the data may or may not be loaded immediately. Eager Loading on other other hand has to be performed immediately.
What you can do is something like this:
//lazily loaded
List<Child> childList = parent.getChild();
// this will get all the child in memory of that particular Parent
Integer childListSize = childList.size();
But if you eager load then all the child will be loaded for each of the parents. This should be your best bet.
What is a best practice to store 'large' data, represented by List in Java, in database?
i'm considering 3 variants:
Use '#OneToMany' to store data in separate table.
Serialize data and store it in parent table.
Store data as files(naming conventions? same as id?).
To be more specific
'Large' data entities:
class SingleSleeper{
private Double startPositionOnLeft;
private Double endPositionOnLeft;
private Double startPositionOnRight;
private Double endPositionOnRight;
....
}
class RutEntry{
private Double width;
private Double position;
...
}
There are about 50 instances of SingleSleeper class and about 25000 instances of RutEntry class in one parent instance. Parent instances are generated about 40 times every day.
i'm using EclipseLink JPA 2.1, derby
Addition
Most of all i'm interested in best readability in Java. But i'm afraid that database speed will significantly decrease if i will store too much data into database. An overwhelming number of requests will be to select all instances of SingleSleeper or RutEntry classes of particular parent entity. I'm not interested for support to different database types, but i can move to other database, if needed.
I think I would do neither of your variants.
I would add a ManyToOne to the child entities (which is somehow the opposite of your first variant):
public class SingleSleeper {
#ManyToOne(optional = false, fetch = FetchType.LAZY)
private ParentEntity parent;
...
}
public class RutEntry {
#ManyToOne(optional = false, fetch = FetchType.LAZY)
private ParentEntity parent;
}
This ensures that you have a mapping and that you never load all 25000 entities for a parent object, if you don't need them (the lazy fetch ensures that you even don't need to load the parent entity).
You can create a OneToMany in the parent object with a mappedBy link, if you really want to. For example because you always need all child objects in the parent entity:
class ParentEntity {
#OneToMany(mappedBy = "parent", fetch = FetchType.LAZY)
Collection<SingleSleeper> singleSleepers;
#OneToMany(mappedBy = "parent", fetch = FetchType.LAZY)
Collection<RutEntry> rutEntries;
}
But I don't know how EclipseLink is working here - for Hibernate you need at least an additional BatchSize annotation to indicate that it should load as many child entities as possible at once. It can't fetch all together with the parent instance (e.g. by defining both as FetchType.EAGER), as only one is allowed to be fetched eagerly (and otherwise you would have 25000 * 50 result rows in the result set of the corresponding SQL select statement).
The best to load all child entities for a parent entity is to load them separate, either using JPQL (easier to read, faster to write) or the Criteria API (typesafe, but you need a metamodel):
ParentEntity parent = entityManager.find(ParentEntity.class, id);
// JPQL:
List<SingleSleeper> singleSleepers = entityManager.createQuery(
"SELECT s FROM SingleSleeper s WHERE s.parent = %parent"
).setParameter("parent", parent).getResultList();
// Or Criteria API:
CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder();
CriteriaQuery<SingleSleeper> query = criteriaBuilder.createQuery(SingleSleeper.class);
Root<SingleSleeper> s = query.from(SingleSleeper.class);
query.select(s).where(criteriaBuilder.equal(s.get(SingleSleeper_.parent), parent));
List<SingleSleeper> singleSleepers = entityManager.createQuery(query).getResultList();
You have three advantages of that approach:
Still easy to read - if you put the loading into its own method.
You are flexible to decide when to load the 25050 children.
You can load a subset of the children as well (by modifying the result of createQuery with Query.setFirstResult and Query.setMaxResults).
Lets say I have an entity:
#Entity
public class Person {
#Id
#GeneratedValue
private Long id;
#ManyToMany(fetch = FetchType.LAZY)
private List<Role> roles;
#ManyToMany(fetch = FetchType.LAZY)
private List<Permission> permissions;
// etc
// other fields here
}
I want a to build a query using the Criteria API that filters these users and shows a list of people and among other info from the entity - how many roles does a person have.
CriteriaBuilder builder = em.getCriteriaBuilder();
CriteriaQuery<Person> query = builder.createQuery(Person.class);
Root<Person> personRoot = query.from(Person.class);
// predicates here
However, this limits me to returning only a list of Person entities. I can always add a #Transient field to the entity, but this seems ugly since I might have many different queries and might end up with many such fields.
On the other hand - I cant use HQL and write the query since I want complex filtering and I would have to deal with appending and removing things from the HQL query.
My question, besides the one in the title of this post is this: how do I query the database using the Criteria API and return a non-entity (in case I want to filter the Person table but return only the number of roles, permissions, etc) and how do I do it for something very close to the actual entity (like the example with the role counter instead of the roles collection)?
UPDATE
Using Hibernate's projections I came up with this. But still don't know that to write in TODO. Projections.count doesn't work since it excpects some kind of grouping, and I don't seem to be able to find any examples in the Hibernate documentation.
Criteria cr = session.createCriteria(Person.class);
if (id != null) {
cr.add(Restrictions.eq("id", id));
}
ProjectionList projectionList = Projections.projectionList();
projectionList.add(Projections.property("id"), "id");
projectionList.add(TODO, "rolesCount");
CriteriaQuery<Long> query = entityManager.getCriteriaBuilder().get().createQuery(Long.class);
query.select(builder.get().countDistinct(root));
works for me:)
how do I do it for something very close to the actual entity (like
the example with the role counter instead of the roles collection
You could make these values properties of your User entity by various means, for example using a Hibernate #Forumula property. This will issue an inline subquery on Entity load to get the count without touching the collection.
#Formula("select count(*) from roles where user_id = ?")
private int numberOfRoles;
Another (JPA compliant) option is to handle these calculated fields by creating a view at the database level and the mapping this to your User:
e.g.
#OneToOne
private UserData userData; //entity mapped to your view (works just like a table)
....
public int getNumberOfRoles(){
return userData.getRoleCOunt();
or
by using #SecondaryTable to join this User data.
I have to do bulk inserts, and need the ids of what's being added. This is a basic example that shows what I am doing (which is obviously horrible for performance). I am looking for a much better way to do this.
public void omgThisIsSlow(final Set<ObjectOne> objOneSet,
final Set<ObjectTwo> objTwoSet) {
for (final ObjectOne objOne : objOneSet) {
persist(objOne);
for (final ObjThree objThree : objOne.getObjThreeSet()) {
objThree.setObjOne(objOne);
persist(objThree);
}
for (final ObjectTwo objTwo : objTwoSet) {
final ObjectTwo objTwoCopy = new ObjTwo();
objTwoCopy.setFoo(objTwo.getFoo());
objTwoCopy.setBar(objTwo.getBar());
persist(objTwoCopy);
final ObjectFour objFour = new ObjectFour();
objFour.setObjOne(objOne);
objFour.setObjTwo(objTwoCopy);
persist(objFour);
}
}
}
In the case above persist is a method which internally calls
sessionFactory.getCurrentSession().saveOrUpdate();
Is there any optimized way of getting back the ids and bulk inserting based upon that?
Thanks!
Update: Got it working with the following additions and help from JustinKSU
import javax.persistence.*;
#Entity
public class ObjectFour{
#ManyToOne(cascade = CascadeType.ALL)
private ObjectOne objOne;
#ManyToOne(cascade = CascadeType.ALL)
private ObjectTwo objTwo;
}
// And similar for other classes and their objects that need to be persisted
If you define the relationships using annotations and define appropriate cascading, you should be able set the object relationships in the objects in java and persist it all in one call. Hibernate will handle setting the foreign keys for you.
Documentation -
http://docs.jboss.org/hibernate/annotations/3.5/reference/en/html/entity.html#entity-mapping-association
An example annotation on a parent object would be
#OneToMany(mappedBy = "foo", fetch = FetchType.LAZY, cascade=CascadeType.ALL)
On the child object you would do the following
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(name = "COLUMN_NAME", nullable = false)
I'm not sure but Hibernate makes bulk inserts/updates. The problem I understand is you need to persist the parent object in order to assign the reference to the child object.
I would try to persist all the "one" objects. And then, iterate over all their "three" objects and persist them in a second bulk insertion.
If your tree has three levels you can achieve all the insertions in 3 batchs. Pretty decent I think.
Assuming that you are just looking at getting a large amount of data persisted in one go and your problem is that you don't know what the IDs are going to be as the various related objects are persisted, one possible solution for this is to run all your inserts (as bulk inserts) into ancillary tables (one per real table) with temporary IDs (and some session ID) and have a stored procedure perform the inserts into the real tables whilst resolving the IDs.