This may be super easy but I can't find any clear statement about it in JPA specs. If I have a List-based relationship without #OrderBy annotation, e.g:
#OneToMany
List<Child> children;
then what will be order of this list elements in Java? It seems reasonable that this will be order of corresponding records in Child table or entries in intermetiate table if it's many to many, but is that a guaranteed behavior of JPA providers?
Order is not guaranteed by the specification as far as I know. #OrderBy is the way to go if you depend on the order.
EDIT: Quote from JPA 1.0 spec:
Portable applications should not expect the order of lists to be maintained across persistence contexts unless the OrderBy construct is used and the modifications to the list observe the specified ordering. The order is not otherwise persistent.
(Page 19, Footnote [4])
Also, when your entity has no children, JPA specs don't specify if it should return an empty list or null when you retrieve your List, so be sure to check it to avoid nullpointerexceptions.
There is no guaranteed behavior, the order can be different each time. It happened to me with hibernate: the order changed when I refreshed the page.
You are able to order the search result when you search, not in the definition of the class and its properties.
Related
I'm setting up a JPA Specification based repository implementation that utilizes jpa specifications(constructed based on RSQL filter strings) to filter the results, define result ordering and remove any duplicates via "distinct" that would otherwise be returned due to joined tables. The JPA Specification builder method joins several tables and sets the "distinct" flag:
final Join<Object, Object> rootJoinedTags = root.join("tags", JoinType.LEFT);
final Join<Object, Object> rootJoinedLocations = root.join("location", JoinType.LEFT);
...
query.distinct(true);
To allow sorting by joined table columns, I've applied the "HINT_PASS_DISTINCT_THROUGH" hint to the relevant repository method(otherwise, sorting by joined table columns returns an error along the lines of "sort column must be included in the SELECT DISTINCT query").
#QueryHints(value = {
#QueryHint(name = org.hibernate.jpa.QueryHints.HINT_PASS_DISTINCT_THROUGH, value = "false")
})
Page<SomeEntity> findAll(#Nullable Specification<SomeEntity> spec, Pageable pageable);
The arguments for said repository method are constructed as such:
final Sort sort = getSort(searchFilter);
final Specification spec = getSpecificationIfPresent(searchFilter);
final PageRequest pageRequest = PageRequest.of(searchFilter.getPageNumber(), searchFilter.getLimit(), sort);
return eventRepository.findAll(spec, pageRequest);
After those changes, filtering and sorting seem to work as expected. However, the hint seems to cause "distinct" filtering to be applied after the result page is already constructed, thus reducing the number of returned entities in the page from the configured "size" PageRequest argument, to whatever is left after the duplicates are filtered out. For example, if we'd make a PageRequest with "page=0" and "pageSize=10", then the resulting Page may return only 5 "SomeEntity" instances, although the database contains way more entries(177 entities to be exact in this case). If I remove the hint, then the returned entities number is correct again.
Question: is there a way to make the same Specification query setup work with correctly sized Pages(some other hints that might be added to have duplicate filtering performed before the Page object is constructed)? If not, then is there another approach I could use to achieve the required Specification-based filtering, with joined-column sorting and duplicate removal as with "distinct"?
PS: PostgreSQL is the database behind the application in question
The problem you are experimenting have to do with the way you are using the HINT_PASS_DISTINCT_THROUGH hint.
This hint allows you to indicate Hibernate that the DISTINCT keyword should not be used in the SELECT statement issued against the database.
You are taking advantage of this fact to allow your queries to be sorted by a field that is not included in the DISTINCT column list.
But that is not how this hint should be used.
This hint only must be used when you are sure that there will be no difference between applying or not a DISTINCT keyword to the SQL SELECT statement, because the SELECT statement already will fetch all the distinct values per se. The idea is improve the performance of the query avoiding the use of an unnecessary DISTINCT statement.
This is usually what will happen when you use the query.distinct method in you criteria queries, and you are join fetching child relationships. This great article of #VladMihalcea explain how the hint works in detail.
On the other hand, when you use paging, it will set OFFSET and LIMIT - or something similar, depending on the underlying database - in the SQL SELECT statement issued against the database, limiting to a maximum number of results your query.
As stated, if you use the HINT_PASS_DISTINCT_THROUGH hint, the SELECT statement will not contain the DISTINCT keyword and, because of your joins, it could potentially give duplicate records of your main entity. This records will be processed by Hibernate to differentiate duplicates, because you are using query.distinct, and it will in fact remove duplicates if needed. I think this is the reason why you may get less records than requested in your Pageable.
If you remove the hint, as the DISTINCT keyword is passed in the SQL statement which is sent to the database, as far as you only project information of the main entity, it will fetch all the records indicated by LIMIT and this is why it will give you always the requested number of records.
You can try and fetch join your child entities (instead of only join with them). It will eliminate the problem of not being able to use the field you need to sort by in the columns of the DISTINCT keyword and, in addition, you will be able to apply, now legitimately, the hint.
But if you do so it will you another problem: if you use join fetch and pagination, to return the main entities and its collections, Hibernate will no longer apply pagination at database level - it will no include OFFSET or LIMIT keywords in the SQL statement, and it will try to paginate the results in memory. This is the famous Hibernate HHH000104 warning:
HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!
#VladMihalcea explain that in great detail in the last part of this article.
He also proposed one possible solution to your problem, Window Functions.
In you use case, instead of using Specifications, the idea is that you implement your own DAO. This DAO only need to have access to the EntityManager, which is not a great deal as you can inject your #PersistenceContext:
#PersistenceContext
protected EntityManager em;
Once you have this EntityManager, you can create native queries and use window functions to build, based on the provided Pageable information, the right SQL statement that will be issued against the database. This will give you a lot of more freedom about what fields use for sorting or whatever you need.
As the last cited article indicates, Window Functions is a feature supported by all mayor databases.
In the case of PostgreSQL, you can easily come across them in the official documentation.
Finally, one more option, suggested in fact by #nickshoe, and explained in great detail in the article he cited, is to perform the sorting and paging process in two phases: in the first phase, you need to create a query that will reference your child entities and in which you will apply paging and sorting. This query will allow you to identify the ids of the main entities that will be used, in the second phase of the process, to obtain the main entities themselves.
You can take advantage of the aforementioned custom DAO to accomplish this process.
It may be an off-topic answer, but it may help you.
You could try to tackle this problem (pagination of parent-child entities) by separating the query in two parts:
a query for retrieving the ids that match the given criteria
a query for retrieving the actual entities by the resulting ids of the previous query
I came across this solution in this blog post: https://vladmihalcea.com/fix-hibernate-hhh000104-entity-fetch-pagination-warning-message/
I have tried searching on Stack Overflow and at other websites the pros, cons and conveniences about using Sets vs Lists but I really couldn't find a DEFINITE answer for when to use this or that.
From Hibernate's documentation, they state that non-duplicate records should go into Sets and, from there, you should implement your hashCode() and equals() for every single entity that could be wrapped into a Set. But then it comes to the price of convenience and ease of use as there are some articles that recommend the use of business-keys as every entity's id and, from there, hashCode() and equals() could then be perfectly implemented for every situation regardless of the object's state (managed, detached, etc).
It's all fine, all fine... until I come across on lots of situations where the use of Sets are just not doable, such as Ordering (though Hibernate gives you the idea of SortedSet), convenience of collectionObj.get(index), collectionObj.remove(int location || Object obj), Android's architecture of ListView/ExpandableListView (GroupIds, ChildIds) and on... My point is: Sets are just really bad (imho) to manipulate and make it work 100%.
I am tempted to change every single collection of my project to List as they work very well. The IDs for all my entities are generated through MYSQL's auto-generated sequence (#GeneratedValue(strategy = GenerationType.IDENTITY)).
Is there anyone out the who could in a definite way clear up my mind in all these little details mentioned above?
Also, is it doable to use Eclipse's auto-generated hashCode() and equals() for the ID field for every entity? Will it be effective in every situation?
Thank you very much,
Renato
List versus Set
Duplicates allowed
Lists allow duplicates and Sets do not allow duplicates. For some this will be the main reason for them choosing List or Set.
Multiple Bag's Exception - Multiple Eager fetching in same query
One notable difference in the handling of Hibernate is that you can't fetch two different lists in a single query.
It will throw an exception "cannot fetch multiple bags". But with sets, no such issues.
A list, if there is no index column specified, will just be handled as a bag by Hibernate (no specific ordering).
#OneToMany
#OrderBy("lastname ASC")
public List<Rating> ratings;
One notable difference in the handling of Hibernate is that you can't fetch two different lists in a single query. For example, if you have a Person entity having a list of contacts and a list of addresses, you won't be able to use a single query to load persons with all their contacts and all their addresses. The solution in this case is to make two queries (which avoids the cartesian product), or to use a Set instead of a List for at least one of the collections.
It's often hard to use Sets with Hibernate when you have to define equals and hashCode on the entities and don't have an immutable functional key in the entity.
furthermore i suggest you this link.
I have a many to many relationship at my Java beans. When I use List to define my variables as like:
#Entity
#Table(name="ScD")
public class Group extends Nameable {
#ManyToMany(cascade = {CascadeType.PERSIST, CascadeType.MERGE}, fetch = FetchType.EAGER)
#JoinColumn(name="b_fk")
private List<R> r;
//or
private Set<R> r;
I get that error:
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.dao.annotation.PersistenceExceptionTranslationPostProcessor#0'
...
When I use Set everything seem to work well.
I want to ask that when using many to many relationships which one to use for logical consept List or Set (because of list may have duplicates and set but what about performance and other issues)?
From relational databases perspective this is a set. Databases do not preserve order and using a List is meaningless, the order in them is unspecified (unless using so called indexed collections).
Using a Set also has great performance implications. When List is used, Hibernate uses PersistentBag collection underneath which has some terrible characteristics. I.e.: if you add a new relationship it will first delete all existing ones and then insert them back + your new one. With Set it just inserts the new record.
Third thing - you cannot have multiple Lists in one entity as you will get infamous cannot simultaneously fetch multiple bags exception.
See also:
19.5. Understanding Collection performance
Why Hibernate does "delete all then re-insert" - its not so strange
How about the uniqueness requirement from Set? Doesn't this force Hibernate to retrieve all objects each time one is added to the collection to make sure a newly added one is unique? A List wouldn't have this limitation.
I know the question was made years ago but I wanted to comment on this topic, just in case someone is doubtful about the set vs list issue.
Regarding lazy fetching, I think a bag (list without index) would be a better option due to the fact that you avoid retrieving all objects each time one is added to the collection to:
make sure a newly added one is unique, in case you are using a set
preserve order, in case you are using a list (with index)
Please correct me if I'm mistaken.
I have the following question about JPA:
Can I save the order of the elements in a java.util.List? In my application the order in which I put elements in the Lists is important but after I get those collections from the database the order is not the same (as expected). Can you show me a way to deal with this problem?
P.S. There is not a field in the entities that I put in the collections by which I can order them.
Rosen
There are some hacky ways of doing this in JPA 1, but it's easiest to switch to a JPA 2 provider. The #OrderColumn annotation support is what you're looking for. Eclipselink have an ok tutorial on how to use it.
JPA has 2 types of Lists. In JPA1 there is an "ordered list" (which is what you see, ordering defined by some SQL clause). In JPA2 you can have "ordered lists" or alternatively "indexed lists" (where the order of creation is preserved) ... the #OrderColumn referred to. Any implementation of JPA2 will have to support this e.g DataNucleus.
JDO has had indexed lists since day 1
You can save the order of the elements in a java.util.List. In JPA 2.0, There is the good way to save the order of element by using #OrderColumn annotation.
For the details, you can refer this link
Order Column (JPA 2.0)
Is anyone aware of the validity of Hibernate's Criteria.list() and Query.list() methods returning multiple occurrences of the same entity?
Occasionally I find when using the Criteria API, that changing the default fetch strategy in my class mapping definition (from "select" to "join") can sometimes affect how many references to the same entity can appear in the resulting output of list(), and I'm unsure whether to treat this as a bug or not. The javadoc does not define it, it simply says "The list of matched query results." (thanks guys).
If this is expected and normal behaviour, then I can de-dup the list myself, that's not a problem, but if it's a bug, then I would prefer to avoid it, rather than de-dup the results and try to ignore it.
Anyone got any experience of this?
Yes, getting duplicates is perfectly possible if you construct your queries so that this can happen. See for example Hibernate CollectionOfElements EAGER fetch duplicates elements
I also started noticing this behavior in my Java API as it started to grow. Glad there is an easy way to prevent it. Out of practice I've started out appending:
.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
To all of my criteria that return a list. For example:
List<PaymentTypeAccountEntity> paymentTypeAccounts = criteria()
.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
.list();
If you have an object which has a list of sub objects on it, and your criteria joins the two tables together, you could potentially get duplicates of the main object.
One way to ensure that you don't get duplicates is to use a DistinctRootEntityResultTransformer. The main drawback to this is if you are using result set buffering/row counting. The two don't work together.
I had the exact same issue with Criteria API. The simple solution for me was to set distinct to true on the query like
CriteriaQuery<Foo> query = criteriaBuilder.createQuery(Foo.class);
query.distinct(true);
Another possible option that came to my mind before would be to simply pass the resulting list to a Set which will also by definition have just an object's single instance.