List vs Set on JPA 2 - Pros / Cons / Convenience

List vs Set on JPA 2 - Pros / Cons / Convenience - java

I have tried searching on Stack Overflow and at other websites the pros, cons and conveniences about using Sets vs Lists but I really couldn't find a DEFINITE answer for when to use this or that.
From Hibernate's documentation, they state that non-duplicate records should go into Sets and, from there, you should implement your hashCode() and equals() for every single entity that could be wrapped into a Set. But then it comes to the price of convenience and ease of use as there are some articles that recommend the use of business-keys as every entity's id and, from there, hashCode() and equals() could then be perfectly implemented for every situation regardless of the object's state (managed, detached, etc).
It's all fine, all fine... until I come across on lots of situations where the use of Sets are just not doable, such as Ordering (though Hibernate gives you the idea of SortedSet), convenience of collectionObj.get(index), collectionObj.remove(int location || Object obj), Android's architecture of ListView/ExpandableListView (GroupIds, ChildIds) and on... My point is: Sets are just really bad (imho) to manipulate and make it work 100%.
I am tempted to change every single collection of my project to List as they work very well. The IDs for all my entities are generated through MYSQL's auto-generated sequence (#GeneratedValue(strategy = GenerationType.IDENTITY)).
Is there anyone out the who could in a definite way clear up my mind in all these little details mentioned above?
Also, is it doable to use Eclipse's auto-generated hashCode() and equals() for the ID field for every entity? Will it be effective in every situation?
Thank you very much,
Renato

List versus Set
Duplicates allowed
Lists allow duplicates and Sets do not allow duplicates. For some this will be the main reason for them choosing List or Set.
Multiple Bag's Exception - Multiple Eager fetching in same query
One notable difference in the handling of Hibernate is that you can't fetch two different lists in a single query.
It will throw an exception "cannot fetch multiple bags". But with sets, no such issues.

A list, if there is no index column specified, will just be handled as a bag by Hibernate (no specific ordering).
#OneToMany
#OrderBy("lastname ASC")
public List<Rating> ratings;
One notable difference in the handling of Hibernate is that you can't fetch two different lists in a single query. For example, if you have a Person entity having a list of contacts and a list of addresses, you won't be able to use a single query to load persons with all their contacts and all their addresses. The solution in this case is to make two queries (which avoids the cartesian product), or to use a Set instead of a List for at least one of the collections.
It's often hard to use Sets with Hibernate when you have to define equals and hashCode on the entities and don't have an immutable functional key in the entity.
furthermore i suggest you this link.

Related

How to prevent Hibernate load the whole collection?

For example, I have the following entity:
class User{
...
private Set questions;
...
}
When I operate the domain model:
user.questions.add(...);
Hibernate will load ALL the questions of this collection, even if I set the collection to LAZY. How can I change this behavior?

You'll have to annotate the collection with
#LazyCollection(LazyCollectionOption.EXTRA)

TL;DR
Don't do this, load all the collection on request.
Details
I think you should reconsider your desire to avoid loading the collection when its elements are updated via add() method call. Here are my arguments:
When you add element, the result (i.e. whether element added or not) might depend on type of the collection. For Set - it definetely depends on collection contents.
In terms of business logic, your Set represents some questions related to user. Let's imagine you achieved the result you want - first five questions are in the collection, the rest ten are not. What is the business meaning of the collection then? Sounds really questionable.
If you consider my arguments bad, feel free to use the techniques described in other answers.

So actual problem description: when I apply cascading there is a performance issue because there can be many questions.
My answer would then be: then don't use cascading to save questions, persist the question using a regular EntityManager.persist() call.
Pretty obvious, right?

If the resultSet size is very huge you can do it in batches by setting max limits
query.setMaxResults(int maxResults)

Hibernate Many to Many Relations Set Or List?

I have a many to many relationship at my Java beans. When I use List to define my variables as like:
#Entity
#Table(name="ScD")
public class Group extends Nameable {
#ManyToMany(cascade = {CascadeType.PERSIST, CascadeType.MERGE}, fetch = FetchType.EAGER)
#JoinColumn(name="b_fk")
private List<R> r;
//or
private Set<R> r;
I get that error:
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.dao.annotation.PersistenceExceptionTranslationPostProcessor#0'
...
When I use Set everything seem to work well.
I want to ask that when using many to many relationships which one to use for logical consept List or Set (because of list may have duplicates and set but what about performance and other issues)?

From relational databases perspective this is a set. Databases do not preserve order and using a List is meaningless, the order in them is unspecified (unless using so called indexed collections).
Using a Set also has great performance implications. When List is used, Hibernate uses PersistentBag collection underneath which has some terrible characteristics. I.e.: if you add a new relationship it will first delete all existing ones and then insert them back + your new one. With Set it just inserts the new record.
Third thing - you cannot have multiple Lists in one entity as you will get infamous cannot simultaneously fetch multiple bags exception.
See also:
19.5. Understanding Collection performance
Why Hibernate does "delete all then re-insert" - its not so strange

How about the uniqueness requirement from Set? Doesn't this force Hibernate to retrieve all objects each time one is added to the collection to make sure a newly added one is unique? A List wouldn't have this limitation.

I know the question was made years ago but I wanted to comment on this topic, just in case someone is doubtful about the set vs list issue.
Regarding lazy fetching, I think a bag (list without index) would be a better option due to the fact that you avoid retrieving all objects each time one is added to the collection to:
make sure a newly added one is unique, in case you are using a set
preserve order, in case you are using a list (with index)
Please correct me if I'm mistaken.

Java - how best to perform set-like operations (e.g. retainAll) based on custom comparison

I have two sets both containing the same object types. I would like to be able to access the following:
the intersection of the 2 sets
the objects contained in set 1 and not in set 2
the objects contained in set 2 and not in set 1
My question relates to how best to compare the two sets to acquire the desired views. The class in question has numerous id properties which can be used to uniquely identify that entity. However, there are also numerous properties in the class that describe the current status of the object. The two sets can contain objects that match according to the ids, but which are in a different state (and as such, not all properties are equal between the two objects).
So - how do I best implement my solution. To implement an equals() method for the class which does not take into account the status properties and only looks at the id properties would not seem to be very true to the name 'equals' and could prove to be confusing later on. Is there some way I can provide a method through which the comparisons are done for the set methods?
Also, I would like to be able to access the 3 views described above without modifying the original sets.
All help is much appreciated!

(Edit: My first suggestion has been removed because of an unfortunate implementation detail in TreeSet, as pointed out by Martin Konecny. Some collection classes (e.g. TreeSet) allow you to supply a Comparator that is to be used to compare elements, so you might want to use one of those classes - at least, if there is some natural way of ordering your objects.)
If not (i.e. if it would be difficult to implement CompareTo(), while it would be simpler to implement HashCode() and Equals()), you could create a wrapper class which implements those two functions by looking at the relevant fields from the objects they wrap, and create a regular HashSet of these wrapper objects.

Short version: implement equals based on the entity's key, not state.
Slightly longer version: What the equals method should check depends on the type of object. For something that's considered a "value" object (say, an Integer or String or an Address), equality is typically based on all fields being the same. For an object with a set of fields that uniquely identify it (its primary key), equality is typically based on the fields of the primary key only. Equality doesn't necessarily need to (and often shouldn't) take in to consideration the state of an object. It needs to determine whether two objects are representations of the same thing. Also, for objects that are used in a Set or as keys in a Map, the fields that are used to determine equality should generally not be mutable, since changing them could cause a Set/Map to stop working as expected.
Once you've implemented equals like this, you can use Guava to view the differences between the two sets:
Set<Foo> notInSet2 = Sets.difference(set1, set2);
Set<Foo> notInSet1 = Sets.difference(set2, set1);
Both difference sets will be live views of the original sets, so changes to the original sets will automatically be reflected in them.

This is a requirement for which the Standard C++ Library fares better with its set type, which accepts a comparator for this purpose. In the Java library, your need is modeled better by a Map— one mapping from your candidate key to either the rest of the status-related fields, or to the complete object that happens to also contain the candidate key. (Note that the C++ set type is mandated to be some sort of balanced tree, usually implemented as a red-black tree, which means it's equivalent to Java's TreeSet, which does accept a custom Comparator.) It's ugly to duplicate the data, but it's also ugly to try to work around it, as you've already found.
If you have control over the type in question and can split it up into separate candidate key and status parts, you can eliminate the duplication. If you can't go that far, consider combining the candidate key fields into a single object held within your larger, complete object; that way, the Map key type will be the same as that candidate key type, and the only storage overhead will be the map keys' object references. The candidate key data would not be duplicated.
Note that most set types are implemented as maps under the covers; they map from the would-be set element type to something like a Boolean flag. Apparently there's too much code that would be duplicated in wholly disjoint set and map types. Once you realize that, backing up from using a set in an awkward way to using a map no longer seems to impose the storage overhead you thought it would.
It's a somewhat depressing realization, having chosen the mathematically correct idealized data structure, only to find it's a false choice down a layer or two, but even in your case your problem sounds better suited to a map representation than a set. Think of it as an index.

Converting Between Hibernate Collections and My Own Collections

I have set up Hibernate to give me a Set<Integer> which I convert internally to and from a Set<MyObjectType> (MyObjectType can be represented by a single integer). That is to say, When Hibernate calls my void setMyObjectTypeCollection(Set<Integer> theSet) method I iterate through all the elements in theSet and convert them to MyObjectType. When Hibernate calls my Set<MyObjectType> getMyObjectTypeCollection() I allocate a new HashSet and convert MyObjectTypes to Integers.
The problem is that every time I call commit, Hibernate deletes everything in the collection and then re-inserts it regardless of whether any element of the collection has changed or even that the collection itself has changed.
While I don't technically consider this a bug, I am afraid that deleting and inserting many rows very often will cause the database to perform unnecessarily slowly.
Is there a way to get Hibernate to recognize that even though I have allocated and returned a different instance of the collection, that the collection actually contains all the items it used to and that there is no need to delete and reinsert them all?

I think the best way to achieve your goal would be to use a UserType. Basically it lets you handle the conversion from SQL to your own objects (back and forth).
You can see an example on how to use it here.

Is it valid for Hibernate list() to return duplicates?

Is anyone aware of the validity of Hibernate's Criteria.list() and Query.list() methods returning multiple occurrences of the same entity?
Occasionally I find when using the Criteria API, that changing the default fetch strategy in my class mapping definition (from "select" to "join") can sometimes affect how many references to the same entity can appear in the resulting output of list(), and I'm unsure whether to treat this as a bug or not. The javadoc does not define it, it simply says "The list of matched query results." (thanks guys).
If this is expected and normal behaviour, then I can de-dup the list myself, that's not a problem, but if it's a bug, then I would prefer to avoid it, rather than de-dup the results and try to ignore it.
Anyone got any experience of this?

Yes, getting duplicates is perfectly possible if you construct your queries so that this can happen. See for example Hibernate CollectionOfElements EAGER fetch duplicates elements

I also started noticing this behavior in my Java API as it started to grow. Glad there is an easy way to prevent it. Out of practice I've started out appending:
.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
To all of my criteria that return a list. For example:
List<PaymentTypeAccountEntity> paymentTypeAccounts = criteria()
.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
.list();

If you have an object which has a list of sub objects on it, and your criteria joins the two tables together, you could potentially get duplicates of the main object.
One way to ensure that you don't get duplicates is to use a DistinctRootEntityResultTransformer. The main drawback to this is if you are using result set buffering/row counting. The two don't work together.

I had the exact same issue with Criteria API. The simple solution for me was to set distinct to true on the query like
CriteriaQuery<Foo> query = criteriaBuilder.createQuery(Foo.class);
query.distinct(true);
Another possible option that came to my mind before would be to simply pass the resulting list to a Set which will also by definition have just an object's single instance.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.