Duplicated results in Hibernate OneToMany List

Duplicated results in Hibernate OneToMany List - java

I have mapped an 1:N relation with a #OneToMany List, but when I access the list, the results are duplicated due to an OUTER JOIN.
This is how the mapping looks like:
#Entity
public class Programmer
#ElementCollection(fetch=FetchType.EAGER)
#CollectionTable(name="emails", joinColumns=#JoinColumn(name="id", nullable=false))
#Column(name="email", nullable=false)
protected Set<String> emails = new HashSet<String>();
#OneToMany(mappedBy="programmer", fetch=FetchType.EAGER)
private List <Game> games = new ArrayList<Game>();
When I get the attribute with prog.getGames(), the results comes duplicated because the Hibernate SQL makes an OUTER JOIN:
from programmer
left outer join emails on programmer.id=emails.id
left outer join game on programmer.id=game.id
where programmer.id=?
Is there any solution without transforming the List into a Set? I need to get the games with prog.getGames(), can not use a custom HQL or Criteria.

While the use of Set<> fundamentally resolves your issue, I'd argue that is simply a bandaid to get the expected results you're after but it doesn't technically address the underlying problem.
You should ultimately be using the default lazy fetch strategy because I'm of the opinion that eagerly loading any associations, particularly collection-based ones, are specific to a query and therefore should be toggled when you construct specific queries and not influenced as a part of your entity mapping model as you're doing.
Consider the future where you add a new query but you're only interestesd in attributes from the aggregate root entity. Your mapping model will still impose eagerly fetching those associations, you'll consume additional resources to by having a larger persistence context which means more memory consumption and impose unnecessary database joins for something which you aren't going to use.
If there are multiple collections that you need to hydrate, I would instead recommend you consider using FetchMode.SUBSELECT instead.
If we assume your query has 10 entities being returned, the default lazy strategy with 2 collections would issue 21 queries (1 for the base result set and 2 for each loaded entity).
The benefit of SUBSELECT is that Hibernate will actually only issue 3 queries (1 for the base result set and 1 for each collection to load all collection elements for all entities). And obviously, depending on certain queries, breaking one query with left-joins into 3 queries could actually perform better at the database level too.

Ive resolved this problem with #Fetch(FetchMode.SUBSELECT)

#OneToMany(mappedBy = "user", fetch = FetchType.EAGER, cascade = CascadeType.ALL)
#Fetch(FetchMode.SUBSELECT)
private List<CompanyUserEntity> companyUserRelations;
I had the same problem. companyUserRelations had duplicate objects (I mean the same pointers to the same object, not duplicated data)
So after reading #dimitry response, I added #Fetch(FetchMode.SUBSELECT) and it worked

Related

Hibernate #ManyToOne/#JoinColumn optimization

I have a Hibernate entity that is comprised of many other entities that are used within the application. The other entities that make up this MainEntity are joined by using #ManyToOne and #JoinColumn. This MainEntity class has 5 columns (#Column) and 7 #ManyToOne/#JoinColumn entities that are used.
I seem to be running into performance issues when retrieving all of these MainEntity classes. We want to serialize the MainEntity to JSON as well as the other entities that are associated with it. Note that there aren't that many that we are retrieving - less than 30 total.
Below is an example of what the class looks like along with my findAll() method to retrieve these classes. I know that #ManyToOne is EAGER by default, so I'm wondering if there's a better way to get all of these entities that is easier on the system. Thank you in advance.
#Entity(name = "MainEntity")
#Table(name = "main_entity")
public class MainEntity {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(name = "id")
private Integer id;
// Other #Columns defined here
#ManyToOne()
#JoinColumn(name = "entity_1_id")
private Entity1 entity1;
#ManyToOne()
#JoinColumn(name = "entity_2_id")
private Entity2 entity2;
#ManyToOne()
#JoinColumn(name = "entity_3_id")
private Entity3 entity3;
// ... and so on, for a total of 7 #ManyToOne() columns
}
Here is the findAll() method that I have:
final List<E> findAllOrdered(Class<E> clazz, Order order) {
final Session session = sessionManager.openNewSession();
try {
return session.createCriteria(clazz)
.addOrder(order)
.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
.list();
} finally {
sessionManager.closeSession(session);
}
}
I found myself having to add the Criteria.DISTINCT_ROOT_ENTITY because we were getting duplicate MainEntity results if a child had multiple associated with it. I suspect this is big part of my performance problem.

If you are retrieving unwanted response and if you want to filter then you may use #JsonIgnore
eg:
#ManyToOne()
#JoinColumn(name = "entity_1_id")
#JsonIgnore
private Entity1 entity1;

Few pointers to consider:
Consider making associations Lazy by default unless you really want to load all the association data and its associations along the parent.
Use JOIN in HQL/criteria based on which association we really want to fetch and the depth of associations.
Or use EntityGraph to decide which associations to be fetch.
Enable show_sql as this show the number of SQLs and the exact SQLs that are getting fired to the DB. This would be a good starting point and subsequently you can tune you associations to LAZY/EAGER, SELECT/JOIN/SUBSELECT based on your use case.
You can run these queries against the DB and see if tuning the query/DB (indexes, partitioning etc) will help reduce the query times.
See if second level cache would help for your use case. Note that second level cache will come with its own complexity and extra overhead and especially if the data is of transactional type and not read-only mostly. With application deployed on nodes maintaining the cache coherence will be another aspect to think about. Need to validate if the extra overhead and complexity is really worth the efficiency outcome of the second level cache.
From an application design perspective, you can also consider and see if you really want to retrieve the MainEntity and the associations in a single request or UI. Instead we could first show the MainEntity with some paging and based on the selection we could fetch the associations for that MainEntity with paging.
Note that, this is not a complete list. But a good starting point and based on your use case you can see which one would fit for you and any other additional techniques.

Hibernate - how to avoid the n+1 issue while keeping a good throughput with #ManyToOne association?

I have a class Entry which has two fields serving auditing purposes: startAuditAction and endAuditAction. One audit action can affect several entries, therefore the class Entry describes ManyToOne relationships as follows:
public class Entry{
#Id
#Column(nullable = false)
protected String path;
#Id
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(nullable = false, name = "start_action_id")
protected AuditAction startAction;
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(updatable = true, nullable = true, name = "end_action_id")
protected AuditAction endAction;
}
I want to retrieve instances of Entry based on conditions on the path field and the audit fields. For example to retrieve entries which have not yet been deleted, the HQL would look something like that:
SELECT DISTINCT entry FROM ENTRY_TABLE entry JOIN FETCH entry.startAction startAct LEFT JOIN FETCH entry.endAction endAct WHERE entry.path LIKE myPath and endAct IS NULL
I am using lazy loading together with JOIN FETCH to avoid the N+1 problem while still being able to access the audit fields. However, I have two problems with this:
Firstly, this really does not seem clean to me: if I know I want to access the audit fields (namely the audit actions timestamp), then they should not be lazy loaded. But if I use eager loading I am facing the n+1 problem even if I use JOIN FETCH (and in that case I do not understand why fetch = FetchType.EAGER would ever be useful)...
Secondly, even though I am avoiding the n+1 problem and therefore firing less SQL queries, I get some performance issues for the overall use of my database, probably because of the joins.
What is the proper way to avoid firing additional queries while preserving a good throughput ?
Thanks!

1- Using join fetch is useful when you have FetchType.LAZY in a field that you know you'll need in that specific case whereas using FetchType.EAGER will force that entity to always load the collection independently from the query
(e.g. with your same configuration example you can do multiple query and only when you need the collection use the JOIN FETCH)
2- You probably have problems somewhere else, i doubt the join is what is slowing you down

JPA cascade for unidirectional relationships without loading everything

So I have some entities that are used as the basis for a coordinate system, for the purpose of this post we'll call them A, B, C and D. Each of these entities has multiple #OneToMany relationships, and I want to cascade deletes. i.e. When some A is deleted, all entities in each of the #OneToMany relationships are deleted too. Fairly standard stuff.
However, I don't see the point in having these entities explicitly tracking these relationships when all I want to do is cascade a delete. I don't see the point in loading all these entities (potentially millions!) into memory each time a new entity is added to the #OneToMany relationship (i.e. using lazy loading only loads in when it's accessed, but it's of course accessed when a new entity in the relationship is added).
Let's add a little example:
#Entity
public class A {
#Id
private long id;
// ... other fields ...
#OneToMany
private Collection<SomeClass> collection;
}
#Entity
public class SomeClass {
#Id
private long id;
// ... other fields ...
#ManyToOne
A a;
#ManyToOne
B b;
// ... likewise for C, D ...
}
There can be multiple classes similar to SomeClass, and so multiple #OneToMany relationships in A (and B,C,D) that require tacking. This gets tedious FAST. Also, every time a new instance of SomeClass is added, I'd need to load the entire collection and this seems exceedingly inefficient (I'd pretty much end up with my entire database loaded into memory just to cascade a delete!!!).
How can I achieve what I want without modifying the underlying database (e.g. specfying ON DELETE CASCADE in the definition), surely the designers of JPA have considered such a use case? Maybe I'm incorrect that I'd need to load the entire collection when adding an entity to the relationship (if so, please explain why :) ).
A similar question was asked here: JPA: unidirectional many-to-one and cascading delete but it doesn't have a satisfactory solution, and it doesn't discuss whether or not the entire relationship gets loaded into memory.

To achieve a multi-level cascade without initializing all the entities you can only use a DB cascade.
There's no other way! That's why you couldn't find a satisfactory solution.
As for the:
Also, every time a new instance of SomeClass is added, I'd need to
load the entire collection and this seems exceedingly inefficient (I'd
pretty much end up with my entire database loaded into memory just to
cascade a delete!!!).
You need to understand the unidirectional Collections taxonomy:
Adding one element to a Set, requires the whole collection to be initializes to enforce the uniqueness Set contract.
a java.util.Collection or an unindexed List means you have a Bag, which are very inefficient in the unidirectional use case. For inverse collections they are fine, but that's out of your current context.
An indexed List (where the order is materialized in the database) is what you might be looking for:
#OrderColumn(name="orders_index")
public List<Order> getOrders() { return orders; }
The indexed list will use the index key for add/remove/update operations. As opposed to a Bag which simply deletes all elements and recreates the collection with the remaining elements, an index List will use the index key to only remove the elements that no longer belong to the List.

JPA SortedSet is not being resorted after persist

I have an entity with the following field:
#ManyToMany(cascade = { CascadeType.ALL }, targetEntity = Comment.class)
#JoinTable(name = "program_to_comment")
#OrderBy("position")
private Set<Comment> comments = new HashSet<Comment>();
but I have the problem that whenever I persist it using:
Program p = entityManager.persist(entity);
the field comes with the objects sorted as it was sorted in the entity object.
Suppose the entity object is configured as following: Program(comments:[Comment(position:15), Comment(position:10)], ...), persisting the entity (entityManager.persist), it will store both comments and the program entity itself to the database. But the resulted entity from the persist method invocation is an object as follows: Program(comments:[Comment(position:15), Comment(position:10)], ...), in the same order gave to the persist method.
From my point of view at this point the resulted entity should present the values following the specified #OrderBy rule, or am I missing something?
Additional information:
JPA2
Hibernate 4.2.0.Final

OrderBy simply add an order by clause to the query used to load the comments of a program. Nothing more. The rest is under your responsibility. So if you want the comments sorted by position when adding comments and persisting them, you have to take care of this by yourself.
I have personally never found this annotation to be really useful. I have also found it not to work in every case, particularly when using a query to fetch programs with their comments, with an order by clause already present in the query. I generally prefer not to use theis annotation, and provide a getSortedComments() method which returns a sorted set or list of comments, using a comparator.

Multiple #ManyToMany sets from one join table

I'm mapping a proprietary database to Hibernate for use with Spring. In it, there are a couple of jointables that, for entity A and entity B have the following schema:
CREATE TABLE AjoinB (
idA int not null,
idB int not null,
groupEnum enum ('groupC', 'groupD', 'groupE'),
primary key(idA, idB, groupEnum)
);
As you can see, this indicates that there can be multiple A-B relationships that put them in different groups. I'd like to end up with, first line for entity A and second for entity B, the following sets
Set<B> BforGroupC, BforGroupD, BforGroupE;
Set<A> AforGroupC, AforGroupD, AforGroupE;
So far, I've only managed to put them in one set and disregard the groupEnum relationship attribute:
#ManyToMany(targetEntity=B.class, cascade={ CascadeType.PERSIST, CascadeType.MERGE } )
#JoinTable(name="AjoinB", joinColumns=#JoinColumn(name="idA"), inverseJoinColumns=#JoinColumn(name="idB") )
private Set<B> BforAllGroups;
and
#ManyToMany( mappedBy = "BforAllGroups", targetEntity = A.class )
private Set<A> AforAllGroups;
How can I make multiple sets where they belong either in groupC, groupD or groupE?
Cheers
Nik

If you're considering doing this, don't. Tables are cheap nowadays what's with the economy and all, so just create one per association; it'll be so much easier.
If you're bound by a legacy database and you can't change the structure of that table I would
Consider skaffman's solution first (+1, btw). Depending on your target database you may be able to write a trigger for your views that would insert adequate "discriminator" value.
If the above isn't possible in your DB, another solution is to use custom SQL for CRUD operations for your collections. Keep in mind that this will NOT work (e.g. your "discriminator value" won't get applied) for complex HQL queries involving your association as part of condition. You can also mix / match this with above - e.g. use views and use custom SQL for insert / delete.
If both of the above fail, go with "association as a separate entity" as suggested by framer8. That's going to be rather ugly (since we're assuming here you can't change your tables) due to composite keys and all extraneous code. It may, in fact, be impossible if any of your associations allows duplicates.

To my knowledge, Hibernate cannot use such a "discriminator" column in the way that you want. Hibernate requires a join table for each of them.
Perhaps you might be able to define additional views on the table, showing each of the groupings?

I think the advise anytime you need to access a field in a link table is to make the link table an object and a hibernate entity in its own right. A would have a set of AtoB objects and AtoB would have a set of B objects. I have a simmilar situation where the link table has a user associated with the link.
select joinTable.b from A a
left join a.AtoB joinTable
where joinTable.group = 'C'
It's not as elegant as having an implicit join done by hibernate, but it does give you the control you need.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.