Jpa + Hibernate join fetch returning inconsistent data - java

We are executing via JPA + Hibernate 4 a query which is returning some inconsistent data.
We have one "parent" table:
PARENT
id *
req_num
active
creation_date
and one "child" table:
CHILD
id *
type
name
email
One parent could have many childs, and this is mapped into database using another table:
PARENT_CHILDS
parent_id (FK to PARENT)
child_id (FK to CHILD)
child_order
In Java, our Parent class has a #OneToMany annotated List named childs. Both of them are annotated with #Entity.
We're using org.hibernate.cfg.ImprovedNamingStrategy as our naming strategy.
The query we're executing is:
select parent from Parent parent join fetch parent.childs child where child.type IN ('01', '02') and child.email = 'mail#mail.com' and parent.active = 1 and parent.reqNum != 'testReqNum'
This is translated to the next plain SQL query (we're seeing this using show_sql=true property):
select parent0_.id as id2_, parent0_.active as active2_, parent0_.creation_date as creation2_
from parent parent0_
inner join parent_childs childs1_ on parent0_.id=childs1_.Application_id
inner join child child2_ on childs1_.child_id=child2_.id
where (child2_.type in (? , ?)) and child2_.email=? and parent0_.active=? and parent0_.req_num<>?
Our parent table has only 2 parents which satisfy the condition "parent0_.active=? and parent0_.req_num<>?". Each of them, has two childs. And only one of their childs satisfies the condition "(child2_.type in (? , ?)) and child2_.email=?".
So, when we execute the SQL query directly to our Oracle database, it returns only 2 rows (the 2 parents with only one child each).
However, in Java we are recovering some weird results, which varies if we use "inner join", "join" or "join fetch". For instance, we're receiving a list of three parents. One with one child, another one with the other child which is not satisfying the mail condition, and a last one with its both childs.
We're wondering why are we experiencing this behaviour, and, more importantly, how could we solve it?
Thanks. Kind regards.

As I mentioned in my comment, a many to many relationship is not necessary to implement your solution. Additionally, your link table is not properly a link table because it has its own id, so must be considered as another entity.
If you can't change the DB schema, you could implement relationships among your entities this way:
a OneToMany relationship from Parent to ParentChild
a OneToOne relationship from ParentChild to Child

Related

Is it possible to have a hibernate generated ID provided to a non-linked entity?

I have two entities, for the sake of convenience: parent, child
My child entity has a many to one relationship with the parent.
For each entity I also have a 'log' entity which has mostly the same fields as the corresponding entity, with the addition of a record start and end timestamp. The idea being I can track the movement of data in the parent and child entities over time.
My issue is that, while I have an integer field referencing the 'parent ID' on the child log entity, I can't populate this on the initial run of my service, or when a new parent & children are created because no IDs exist at that point.
So my question is, is it possible to make a reference to a parent ID in my child log table WITHOUT adding a OneToMany relationship to the parent in my log entity in order to reference the parent ID without creating a complex join of parent -> child -> childLog in order to filter by parentID and childID?
You can query with JOIN...ON:
SELECT p
FROM PurchaseOrder p
JOIN PurchaseOrderItem i ON p.id = i.purchaseOrderId
GROUP BY p.id
See also: https://martinelli.ch/how-to-join-two-entities-without-mapped-relationship/

Strategy for writing to parent entity as well when storing child entities on a JPA schema with TABLE_PER_CLASS inheritance strategy

#Inheritance(strategy=InheritanceType.TABLE_PER_CLASS)
in parent class allow to treat child entities as tables with the same information as parent plus some attributes. That's fine for me when reading (http://viralpatel.net/blogs/hibernate-inheritance-table-per-concrete-class-annotation-xml-mapping/).
But I'd like that in case I store a child entity (currently it only fills the child table), the parent table gets updated as well with the insertion of the shared part.
Is that possible to configure, or do I need to explicitly perform childRepo.save(child) plus parentRepo.save(child) in a transaction at any given write operation?
It would be in a way a mix from TABLE_PER_CLASS and JOINED strategies. JOINED stores shared info in parent, while extra attributes in child; whereas TABLE_PER_CLASS stores everything in child. I want child tables with all information whilst parent table gets synchronizedly all the shared info as well.
There is 3 inheritance strategies : SINGLE_TABLE,JOINED, and TABLE_PER_CLASS.
-TABLE_PER_CLASS : will map each entity to a table
-SINGLE_TABLE : the parent entity and the children will be mapped to only one table(in DB you get only one table for the whole inheritance hierarchy).
-JOINED : this is the one you need, the shared attributes will be persisted in the parent table, and the child relative attributes will be persisted in the child table
Adding into the 3 inheritance strategies : SINGLE_TABLE, TABLE_PER_CLASS and JOINED.
Talking about insert, table_per_class has separate insert queries on same table but different discriminator value. Table will have nulls as well for each alternate column, the discriminator has been set for. Select will also be separate. If union sub class is used, then a single select query with union.
In Joined strategy, two inserts each for parent and child table. Select is done with parent table left outer join with child table.
So if you want to use table_per_class, you might use either union per subclass with generator class as assigned and polymorphism value as implicit or joined-sub-class.

JPA/Hibernate query: load eagerly with subselect

Situation: "Parent" entity has multiple "Child" entities (#OneToMany, #Lazy) - two way relationship. No foreign key ("Child#parentId") field on entity.
Goal: Avoid N+1 problem by retrieving fully loaded Parent collection using sub-selects. If I understand theory of Subselect, this is my goal (2 resulting SQL queries):
select * from Parent ...;
select * from Child where parent_id in ...;
Question 1: What is the best practice to achieve this? Could you provide examples in both JPQL/HSQL and Criteria?
Question 2 (bonus): Can API manage second query division into "batches" - e.g. limit batches to 500: if 1st query loads 1000 Parents, 2a. loads Children for 500 Parents, 2b. loads for next 500.
I have tried:
Both result in SQL JOINs, it seems that I cannot use Child's foreign key without JOIN.
// 2nd query:
criteria
.createAlias("parent", "p")
.add(Property.forName("p.id")
.in(parentCriteria.setProjection(Projections.property("id"))))
.list();
// 2nd query (manual):
criteria
.createAlias("parent", "p")
.add(Property.forName("p.id").in(parentIdList))
.list();
Update (2015-04-05)
I checked that what it indeed works in EclipseLink via hints:
query.setHint("eclipselink.batch.type", "EXISTS");
This link http://blog.ringerc.id.au/2012/06/jpa2-is-very-inflexible-with-eagerlazy.html suggests that this is not possible via Hibernate and suggests manual fetching. However I cannot understand how to achieve it via HQL or Criteria, specifically how to get child.parent_id column that is not on Entity, but exists only on Database. That is, avoiding JOIN that would result from child.parent.id.
To avoid N+1 queries you can annotate relationship with
#BatchFetch(BatchFetchType.JOIN) //in eclipselink or
#BatchSize //in hibernate.
Inside queries, you can add fetch to join clause:
select p from Parent p join fetch p.children c where ...
You can add also query hints
query.setHint("eclipselink.batch", "p.children");
Or use EntityGraphs.

Querying table with subclasses-like relationships

In the database of my app there are currently 3 tables:
Parent Table - (general goal)
ChildA
ChildB
If I were to speak in terms of OOP, both ChildA and ChildB are "subclasses" of the Parent table, however they are not similar.
The Relationships between the tables:
A row in the Parent Table has an integer that defines whether the row is related to type A (ChildA) or type B (ChildB).
In both ChildA and ChildB there is a reference to the related row in Parent Table (id). There can be only 1 Parent row related to a child and there can also be 1 child related to a parent (one-to-one r/s).
There is not any pair of columns with the same name inside all of the tables.
What I'm trying to do is to basically retrieve all of the rows in the Parent table, then according to the type column of each row to retrieve additional related info from either ChildA or ChildB.
This would be very easy to do if I were to first retrieve all of the parent rows, and then run through the rows with a loop and query n times for every row, but that would probably be highly inefficient, I guess.
I was wondering whether there is a better approach to this, perhaps even in a single query.
I know I could use INNER JOIN or something, but I'm not sure how it'd work in this case where I need to join 2 tables with a third one (and where the columns are different both in number and content).
So the question is, what would be the most efficient way to preform it?
EDIT:
I saw this question was marked as a duplicate of another question, however, I do not ask how to design my database, but how to QUERY it.
I'm using a Table-Per-Type design, and would like to get all of the rows from all of the different types (currently 2).
I would know how to do so in a case where I wanted to get all of the rows from a single type, but not in this situation, which is why I'm asking whether and how it would be possible with a single query (with a mechanism similar to JOIN for example). I know I could achieve it by querying twice, but I'd like to learn a more efficient way to do it.
I can think of two different approaches (with their pluses and minuses :)
1) Have as many queries as subtypes and retrieve a subtype at a time. In the example case, you will have two queries:
select * from ChildA where id in (select childId from Parent where childType='A')
select * from ChildB where id in (select childId from Parent where childType='B')
This will give you the lowest possible data transfer between your application and the database at a relatively reasonable performance. You will "waste" the effort your database makes to filter the Parent table (the database will have to do it twice)
2) You have one query which retrieves both ChildA and ChildB as part of the same result set like this:
select ChildA.*, ChildB.* from Parent
left outer join ChildA on Parent.ChildId=ChildA.id
left outer join ChildB on Parent.ChildId=ChildB.id
The above query only works if children have unique ids (that is, if there is a ChildA with id 5, there is no ChildB with id 5). If this is not the case, you need a slightly "uglier" query:
select ChildA.*, ChildB.* from Parent, ChildA, ChildB
where (Parent.ChildType='A' and Parent.ChildId=ChildA.id) or
(Parent.ChildType='B' and Parent.ChildId=ChildB.id)
This will give you a result set which contains all columns from both ChildA and ChildB with many NULL values (for each ChildA record, all ChildB columns would be NULL). This way, you have a single query (which may perform faster that the multiple queries in the first approach), but you need to ship more data.
You can create a join for this like this:
select * from (a outer join b on a.key = b.fg_key) outer join c on a.key = c.fg_key
I am not 100% sure about the placement of the opening brace but I remember using this before
But as the amount of sub-types grows, it will become more and more complex to maintain this properly, all column names must be aliased in the query. It would be easiest to perform a two step load, first load all items of type 1, then of type 2. This will keep the code cleaner and easier to maintain.
Efficiency-wise I don't expect this to be much slower than a single query. I would suggest going with the multiple query variant in the first version and optimize when performance becomes an issue.
Basically this kind of requirement we should try to handle in our programme
we should not try to achieve from database.
However I tried below query where we it outputs from one of the child table and in other child only null;
select parent.id, parent.description, (select name from car where id = parent.id) as child1, (select Name from bike where id = parent.id) as child2 from vehicle parent;
Hope this will help you.

Hibernate uses initial WHERE clause in subsequent queries

In using Hibernate's JPA implementation, I noticed an interesting optimization behavior. Within the same transaction, the initial JPA query's WHERE clause is used for subsequent queries involving the results of the initial query.
For example, person has lastName and a set of owned books.
// (1) get person by last name
Query q = entityManager.createQuery("SELECT p FROM Person p WHERE p.firstName = :lastName");
q.setParameter("lastName", "Smith");
List<Person> persons = q.getResultList();
// (2) get books owned by some arbitrary person in persons
Person person = persons.get(n);
Collection<Book> books = person.books;
(1) translates to the SQL:
SELECT ... FROM Person WHERE lastName = 'Smith'
When (2) is run and accesses Person's books, it generates the SQL:
SELECT ... FROM Person_Book book0_ WHERE book0_.personId IN (SELECT ... FROM ... WHERE lastName = 'Smith')
Somehow the WHERE clause from (1) is remembered and used in subsequent queries (2) involving the retrieved person. What is this behavior in Hibernate called and how can I configure it?
Follow up: I'm using subselect on person's books. This explains the behavior that I'm seeing.
Extracted from this link:
The last form of fetching I want to cover is subselect fetching. Subselect fetching is very similar to batch size controlled fetching, which I just described, but takes the 'numerical complications' out of the equation. Subselect fetching is actually a different type of fetching strategy that is applied to collection style associations. Unlike join style fetching, however, subselect fetching is still compatible with lazy associations. The difference is that subselect fetching just gets "the whole shootin' match" as a co-worker of mine would say, rather than just a batch. In other words, it uses subselect execution to pass the ID set of the main entity set into the select off of the association table:
select * from owner
select * from pet where owner_id in (select id from owner)

Categories