I need a Many-To-Many Relationship within the same Database. I don't mind creating mapping databases, but I want to have only one Entity in the end.
Let's say I've got a resource which can have many resources (Sub-Resources). What I need is an Resource with the Sub-Resources and also the count of them because one Resource can have x resources.
Essentially, I need this with the extra Attribute of the count of Sub resources needed for the Resource.
#Table(name = "resources")
public class Resources {
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private long id;
#Column
private String name;
#ManyToMany
private Collection<Resources> subResources;
}
To clarify that a bit, at best I would have something like that:
#Table(name = "resources")
public class Resources {
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private long id;
#Column
private String name;
#ManyToMany
private HashMap<Resources, Integer /* count */> subResources;
}
I know how it works with two tables (Resources & Sub Resources) and a mapping type, but I couldn't figure out how to do it as described above, since Resources can be Sub-Resources at the same time.
Thanks in advance
EDIT: I need an extra Attribute in the mapping table where I can set the amount of sub resources as an Integer
The configuration yo have will work for a unidirectional relationship. There is no technical problem, only you will not be able to specify the multiple parents of a subresource, so in the end it is not many to many.
To make it trully many to many you need another field on the Resources class to define the inverse side of the relationship; I have added the #JoinTable annotation to make the names in the join table explicit, but it is optional if the defaults are good enough for you; I also switched from the vary basic Collection to List; I would prefer Set and you would have to provide equals and hashCode on the entity. Finally I am always initializing the collection-valued fields (ArrayList here; HashSet if you go for Set), so as to avoid silly NullPointerExceptions or complex initialization code:
#ManyToMany
#JoinTable(
name = "RESOURCE_SUBRESOURCE",
joinColumns = #JoinColumn(name = "resource_id"),
inverseJoinColumns = #JoinColumn(name = "subresource_id")
)
private List<Resource> subResources = new ArrayList<>();
// the mappedBy signals that this is the inverse side of the relation, not a new relation altogether
#ManyToMany(mappedBy = "subResources")
private List<Resource> parentResources = new ArrayList<>();
Use as:
Resources r1 = new Resources();
r1.setName("alpha");
em.persist(r1);
Resources r2 = new Resources();
r2.setName("beta");
r2.getSubResources().add(r1);
em.persist(r2);
Resources r3 = new Resources();
r3.setName("gama");
em.persist(r3);
Resources r4 = new Resources();
r4.setName("delta");
// won't work, you need to set the owning side of the relationship, not the inverse:
r4.setParentResources(Arrays.asList(r2, r3));
// will work like this:
r2.getSubResources().add(r4);
r3.getSubResources().add(r4);
// I believe that the order of the following operations is important, unless you set cascade on the relationship
em.persist(r4);
r2 = em.merge(r2);
r3 = em.merge(r3);
As for the count: In the question you mention that you want a count of related objects. While specific JPA providers (Hibernate, EclipseLink) may allow you to accomplish this (using a read-only field that is populated by an aggragate query - COUNT(*) FROM JoinTable WHERE resource_id=?), it is not standard. You can always do resource.getSubResources().size(), but that would fetch all the subresources into memory, which is not a good thing and might in fact be a really bad thing if you call in frequently or there are many sub/parent resources.
I would prefer to run a separate count query, perhaps even for a set of resource ids, whenever I really need this.
Related
I have a problem that loading my lazy collections produces a lot of SQL-Statements and I wonder if there is no more efficient way of loading the data.
Situation:
Parent has a lazy collection of Child called children. It is actually a Many-To-Many relation.
I load a list of Parents with a CrudRepository and I need to get all child_ids for each Parent. So every time I access the children collection i executes a new SQL-Statement.
If i load 200 Parents there are 201 queries executes (1 for the list of Parents and 1 for each Parent's children).
Any idea how i can load the data with just one query?
EDIT 1
Parent/Child is probably a bad naming here. In fact i have a Many-To-Many relation.
Here is some code:
#Entity
public class Tour {
#Id
#GeneratedValue(generator = "system-uuid")
#GenericGenerator(name="system-uuid",
strategy = "uuid2")
#Column(length = 60)
private String id;
#ManyToMany
#JoinTable(
name="parent_images",
joinColumns = #JoinColumn(name="tour_id", referencedColumnName = "id"),
inverseJoinColumns = #JoinColumn(name="image_id", referencedColumnName = "id"),
foreignKey = #ForeignKey(name = "FK_TOUR_IMAGE_TOUR"),
inverseForeignKey = #ForeignKey(name = "FK_TOUR_IMAGE_IMAGE")
)
private List<Image> images = new ArrayList<>();
}
#Entity
public class Image {
#Id
#GeneratedValue(generator = "system-uuid")
#GenericGenerator(name="system-uuid",
strategy = "uuid2")
#Column(length = 40)
private String id;
//....
}
// Code to fetch:
#Autowired
TourRepository repo;
List<Tour> tours = repo.findBy(....);
List<String> imageIds = new ArrayList<>();
for(Tour tour : tours){
imageIds.addAll(tour.getImages().stream().map(b -> b.getId()).collect(Collectors.toList()));
}
As another answer suggested, JOIN FETCH is usually the way to solve similar problem. What happens internally for join-fetch is that the generated SQL will contains columns of the join-fetched entities.
However, you shouldn't blindly treat join-fetch being the panacea.
One common case is you want to retrieve entities with 2 One-To-Many relationships. For example, you have User, and each User may have multiple Address and Phone
If you naively do a from User user join fetch user.phones join fetch users.addresses, Hibernate will either report problem in your query, or generate a inefficient query which contains Cartesian product of addresses and phones.
In the above case, one solution is to break it into multiple queries:
from User user join fetch user.phones where .... followed by from User user join fetch user.addresses where .....
Keep in mind: less number of SQL does not always means better performance. In some situation, breaking up queries may improve performance.
That's the whole idea behind lazy collections :)
Meaning, a lazy collection will only be queried if the getter for that collection is called, what you're saying is that you load all entities and something (code, framework, whatever) calls the getChildren (assumption) for that entity; This will produce those queries.
Now, if this is always happening, then first of all, there's no point in having a lazy collection, set them as EAGER. - EDIT: as said in the comments, EAGER is rarely the solution, in this case in particular it definitely does not seem like it, the join is though :)
Either way, for your case that won't help, what you want is to load all data at once I assume, for that, when you do the query you have to make the join explicit, example with JPQL:
SELECT p FROM Parent p LEFT JOIN FETCH p.children
I have an entity with string id:
#Table
#Entity
public class Stock {
#Id
#Column(nullable = false, length = 64)
private String index;
#Column(nullable = false)
private Integer price;
}
And JpaRepository for it:
public interface StockRepository extends JpaRepository<Stock, String> {
}
When I call stockRepository::findAll, I have N + 1 problem:
logs are simplified
select s.index, s.price from stock s
select s.index, s.price from stock s where s.index = ?
The last line from the quote calls about 5K times (the size of the table). Also, when I update prices, I do next:
stockRepository.save(listOfStocksWithUpdatedPrices);
In logs I have N inserts.
I haven't seen similar behavior when id was numeric.
P.S. set id's type to numeric is not the best solution in my case.
UPDATE1:
I forgot to mention that there is also Trade class that has many-to-many relation with Stock:
#Table
#Entity
public class Trade {
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private Integer id;
#Column
#Enumerated(EnumType.STRING)
private TradeType type;
#Column
#Enumerated(EnumType.STRING)
private TradeState state;
#MapKey(name = "index")
#ManyToMany(fetch = FetchType.EAGER)
#JoinTable(name = "trade_stock",
joinColumns = { #JoinColumn(name = "id", referencedColumnName = "id") },
inverseJoinColumns = { #JoinColumn(name = "stock_index", referencedColumnName = "index") })
private Map<String, Stock> stocks = new HashMap<>();
}
UPDATE2:
I added many-to-many relation for the Stock side:
#ManyToMany(cascade = CascadeType.ALL, mappedBy = "stocks") //lazy by default
Set<Trade> trades = new HashSet<>();
But now it left joins trades (but they're lazy), and all trade's collections (they are lazy too). However, generated Stock::toString method throws LazyInitializationException exception.
Related answer: JPA eager fetch does not join
You basically need to set #Fetch(FetchMode.JOIN), because fetch = FetchType.EAGER just specifies that the relationship will be loaded, not how.
Also what might help with your problem is
#BatchSize annotation, which specifies how many lazy collections will be loaded, when the first one is requested. For example, if you have 100 trades in memory (with stocks not initializes) #BatchSize(size=50) will make sure that only 2 queries will be used. Effectively changing n+1 to (n+1)/50.
https://docs.jboss.org/hibernate/orm/4.3/javadocs/org/hibernate/annotations/BatchSize.html
Regarding inserts, you may want to set
hibernate.jdbc.batch_size property and set order_inserts and order_updates to true as well.
https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/
However, generated Stock::toString method throws
LazyInitializationException exception.
Okay, from this I am assuming you have generated toString() (and most likely equals() and hashcode() methods) using either Lombok or an IDE generator based on all fields of your class.
Do not override equals() hashcode() and toString() in this way in a JPA environment as it has the potential to (a) trigger the exception you have seen if toString() accesses a lazily loaded collection outside of a transaction and (b) trigger the loading of extremely large volumes of data when used within a transaction. Write a sensible to String that does not involve associations and implement equals() and hashcode() using (a) some business key if one is available, (b) the ID (being aware if possible issues with this approach or (c) do not override them at all.
So firstly, remove these generated methods and see if that improves things a bit.
With regards to the inserts, I do notice one thing that is often overlooked in JPA. I don't know what Database you use, but you have to be careful with
#GeneratedValue(strategy = GenerationType.AUTO)
For MySQL I think all JPA implementations map to an auto_incremented field, and once you know how JPA works, this has two implication.
Every insert will consist of two queries. First the insert and then a select query (LAST_INSERT_ID for MySQL) to get the generated primary key.
It also prevents any batch query optimization, because each query needs to be done in it's own insert.
If you insert a large number of objects, and you want good performance, I would recommend using table generated sequences, where you let JPA pre-allocate IDs in large chunks, this also allows the SQL driver do batch Insert into (...) VALUES(...) optimizations.
Another recommendation (not everyone agrees with me on this one). Personally I never use ManyToMany, I always decompose it into OneToMany and ManyToOne with the join table as a real entity. I like the added control it gives over cascading and fetch, and you avoid some of the ManyToMany traps that exist with bi-directional relations.
at the moment I develop a Spring Boot application which mainly pulls product review data from a message queue (~5 concurrent consumer) and stores them to a MySQL DB. Each review can be uniquely identified by its reviewIdentifier (String), which is the primary key and can belong to one or more product (e.g. products with different colors). Here is an excerpt of the data-model:
public class ProductPlacement implements Serializable{
private static final long serialVersionUID = 1L;
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
#Column(name = "product_placement_id")
private long id;
#ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy="productPlacements")
private Set<CustomerReview> customerReviews;
}
public class CustomerReview implements Serializable{
private static final long serialVersionUID = 1L;
#Id
#Column(name = "customer_review_id")
private String reviewIdentifier;
#ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
#JoinTable(
name = "tb_miner_review_to_product",
joinColumns = #JoinColumn(name = "customer_review_id"),
inverseJoinColumns = #JoinColumn(name = "product_placement_id")
)
private Set<ProductPlacement> productPlacements;
}
One message from the queue contains 1 - 15 reviews and a productPlacementId. Now I want an efficient method to persist the reviews for the product. There are basically two cases which need to be considered for each incomming review:
The review is not in the database -> insert review with reference to the product that is contained in the message
The review is already in the database -> just add the product reference to the Set productPlacements of the existing review.
Currently my method for persisting the reviews is not optimal. It looks as follows (uses Spring Data JpaRespoitories):
#Override
#Transactional
public void saveAllReviews(List<CustomerReview> customerReviews, long productPlacementId) {
ProductPlacement placement = productPlacementRepository.findOne(productPlacementId);
for(CustomerReview review: customerReviews){
CustomerReview cr = customerReviewRepository.findOne(review.getReviewIdentifier());
if (cr!=null){
cr.getProductPlacements().add(placement);
customerReviewRepository.saveAndFlush(cr);
}
else{
Set<ProductPlacement> productPlacements = new HashSet<>();
productPlacements.add(placement);
review.setProductPlacements(productPlacements);
cr = review;
customerReviewRepository.saveAndFlush(cr);
}
}
}
Questions:
I sometimes get constraintViolationExceptions because of violating the unique constraint on the "reviewIndentifier". This is obviously because I (concurrently) look if the review is already present and than insert or update it. How can I avoid that?
Is it better to use save() or saveAndFlush() in my case. I get ~50-80 reviews per secound. Will hibernate flush automatically if I just use save() or will it result in greatly increased memory usage?
Update to question 1: Would a simple #Lock on my Review-Repository prefent the unique-constraint exception?
#Lock(LockModeType.PESSIMISTIC_WRITE)
CustomerReview findByReviewIdentifier(String reviewIdentifier);
What happens when the findByReviewIdentifier returns null? Can hibernate lock the reviewIdentifier for a potential insert even if the method returns null?
Thank you!
From a performance point of view, I will consider evaluating the solution with the following changes.
Changing from bidirectional ManyToMany to bidirectional OneToMany
I had a same question on which one is more efficient from DML statements that gets executed. Quoting from Typical ManyToMany mapping versus two OneToMany.
The option one might be simpler from a configuration perspective, but it yields less efficient DML statements.
Use the second option because whenever the associations are controlled by #ManyToOne associations, the DML statements are always the most efficient ones.
Enable the batching of DML statements
Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.
Quoting from batch INSERT and UPDATE statements
hibernate.jdbc.batch_size = 50
hibernate.order_inserts = true
hibernate.order_updates = true
hibernate.jdbc.batch_versioned_data = true
Remove the number of saveAndFlush calls
The current code gets the ProductPlacement and for each review it does a saveAndFlush, which results in no batching of DML statements.
Instead I would consider loading the ProductPlacement entity and adding the List<CustomerReview> customerReviews to the Set<CustomerReview> customerReviews field of ProductPlacement entity and finally call the merge method once at the end, with these two changes:
Making ProductPlacement entity owner of the association i.e., by moving mappedBy attribute onto Set<ProductPlacement> productPlacements field of CustomerReview entity.
Making CustomerReview entity implement equals and hashCode method by using reviewIdentifier field in these method. I believe reviewIdentifier is unique and user assigned.
Finally, as you do performance tuning with these changes, baseline your performance with the current code. Then make the changes and compare if the changes are really resulting in the any significant performance improvement for your solution.
I have a Group entity that has a list of User entities in a many to many relationship. It is mapped by a typical join table containing the two IDs. This list may be very large, a million or more users in a group.
I need to add a new user to the group, typically that will be something like
group.getUsers().add(user);
user.getGroups().add(group);
em.merge(group);
em.merge(user);
If I understand typical JPA operation, will this require pulling down the entire list of 1 million+ users into the collection in order to add the new user and then save? That doesn't sound very scalable to me.
Should I simply not be defining this relationship in JPA? Should I be manipulating the join table entries directly in a case like this?
Please forgive the loose syntax, I'm actually using Spring Data JPA so I don't directly use the entity manager directly very often, but the question seems to be general to JPA so I wanted to pose it that way.
Design your models like this and play with UserGroup for associations.
#Entity
public class User {
#OneToMany(cascade = CascadeType.ALL, mappedBy = "user",fetch = FetchType.LAZY)
#OnDelete(action = OnDeleteAction.CASCADE)
private Set<UserGroup> userGroups = new HashSet<UserGroup>();
}
#Entity
#Table(name="user_group",
uniqueConstraints = {#UniqueConstraint(columnNames = {"user_id", "group_id"})})
public class UserGroup {
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(name = "user_id", nullable = false)
#ForeignKey(name = "usergroup_user_fkey")
private User user;
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(name = "group_id", nullable = false)
#ForeignKey(name = "usergroup_group_fkey")
private Group group;
}
#Entity
public class Group {
#OneToMany(cascade = CascadeType.ALL, mappedBy="group", fetch = FetchType.LAZY )
#OnDelete(action = OnDeleteAction.CASCADE)
private Set<UserGroup> userGroups = new HashSet<UserGroup>();
}
Do like this.
User user = findUserId(id); //All groups wont be loaded they are marked lazy
Group group = findGroupId(id); //All users wont be loaded they are marked lazy
UserGroup userGroup = new UserGroup();
userGroup.setUser(user);
userGroup.setGroup(group);
em.save(userGroup);
Using the ManyToMany mapping effectively is caching the collection in the entity, so you might not want to do this for large collections, as displaying it or passing the entity around with it triggered will kill performance.
Instead you might remove the mapping on both sides, and create an entity for the relation table that you can use in queries when you do need to access the relationship. Using an intermediate entity will allow you to use paging and cursors, so that you can limit the data that might be brought back into usable chunks, and you can insert a new entity to represent new relationships with ease.
EclipseLink's attribute change tracking though does allow adding to collections without the need to trigger the relationship, as well as other performance enhancements. This is enabled with weaving and available on collection types that do not maintain order.
The collection classes returned by getUsers() and getGroups() don't have to have their contents resident in memory, and if you have lazy fetching turned on, as I assume you do for such a large relationship, the persistence provider should be smart enough to recognize that you're not trying to read the contents but just adding a value. (Similarly, calling size() on the collection will typically cause a SQL COUNT query rather than actually loading and counting the elements.)
I have two tables: t_promo_program and t_promo_program_param.
They are represented by the following JPA entities:
#Entity
#Table(name = "t_promo_program")
public class PromoProgram {
#Id
#Column(name = "promo_program_id")
private Long id;
#OneToMany(cascade = {CascadeType.REMOVE})
#JoinColumn(name = "promo_program_id")
private List<PromoProgramParam> params;
}
#Entity
#Table(name = "t_promo_program_param")
public class PromoProgramParam {
#Id
#Column(name = "promo_program_param_id")
private Long id;
//#NotNull // This is a Hibernate annotation so that my test db gets created with the NOT NULL attribute, I'm not married to this annotation.
#ManyToOne
#JoinColumn(name = "PROMO_PROGRAM_ID", referencedColumnName = "promo_program_id")
private PromoProgram promoProgram;
}
When I delete a PromoProgram, Hibernate hits my database with:
update
T_PROMO_PROGRAM_PARAM
set
promo_program_id=null
where
promo_program_id=?
delete
from
t_promo_program
where
promo_program_id=?
and last_change=?
I'm at a loss for where to start looking for the source of the problem.
Oh crud, it was a missing "mappedBy" field in PromoProgram.
Double-check whether you're maintaining bidirectional association consistency. That is; make sure that all PromoProgramParam entities that link to a PromoProgram as its parent are also contained in said parent's params list. It's a good idea to make sure this happens regardless of which side "initiates" the association if you will; if setPromoProgram is called on a PromoProgramParam, have the setter automatically add itself to the PromoProgram's params list. Vice versa, when calling addPromoProgramParam on a PromoProgram, have it set itself as the param's parent.
I've encountered this problem before as well, and it was due to not maintaining bidirectional consistency. I debugged around into Hibernate and found that it was unable to cascade the delete operation to the children because they weren't in the list. However, they most certainly were present in the database, and caused FK exceptions as Hibernate tried to delete only the parent without first deleting its children (which you've likely also encountered with the #NonNull in place).
FYI, I believe the proper "EJB 3.0"-way of making the PromoProgramParam.promoProgram field (say that a 100 times) non-nullable is to set the optional=false attribute on the #ManyToOne annotation.