Java: Optimally handling many-to-many relation with huge data sets

Java: Optimally handling many-to-many relation with huge data sets - java

here is my issue.
i have a huge amount of data in many-to-many tables that i have to update quite often and also request them with dynamic filters, order and request by location (postgis-db)...
shop entity:
public class Shop extends BaseEntity {
....
#ManyToOne
#JsonIdentityReference(alwaysAsId = true)
#JsonSerialize(using = IdSerializer.class)
#JoinColumn(name = "parent_id")
private Shop parent;
#ManyToMany(mappedBy = "shops")
#JsonIdentityReference(alwaysAsId = true)
#JsonIgnore
private List<Product> products = new ArrayList<>();
...
}
product entity:
public class Product extends BaseEntity {
...
#ManyToMany
#JsonIdentityReference(alwaysAsId = true)
#JoinTable(name = "product_shop",
joinColumns = #JoinColumn(name = "product_id"),
inverseJoinColumns = #JoinColumn(name = "shop_id"))
private List<Shop> shops = new ArrayList<>();
the product table consists of about 1 million rows and the shops of about 8000. hence, the many-to-many table is 8*10^9 rows long in the worst case which results in really long request times.
any idea to solve this performantly with keeping in mind, that i want to sort the shops according to a location field which.
current tech stack:
java 8
spring boot
hibernate / jpa
postgresql / postgis

Related

Spring Data JPA - Delete many to many entries

I am attempting to remove entries from a many to many relationship using Spring Data JPA. One of the models is the owner of the relationship and I need to remove entries of the non-owner entity. These are the models:
Workflow entity
#Entity(name = "workflows")
public class Workflow {
#Id
#Column(name = "workflow_id", updatable = false, nullable = false)
#GeneratedValue(strategy = GenerationType.AUTO)
private UUID workflowId;
#ManyToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE })
#JoinTable(name = "workflow_data",
joinColumns = #JoinColumn(name = "workflow_id", referencedColumnName = "workflow_id"),
inverseJoinColumns = #JoinColumn(name = "data_upload_id", referencedColumnName = "data_upload_id"))
private Set<DataUpload> dataUploads = new HashSet<>();
// Setters and getters...
}
DataUpload entity
#Entity(name = "data_uploads")
public class DataUpload {
#Id
#Column(name = "data_upload_id")
private UUID dataUploadId;
#ManyToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE }, mappedBy = "dataUploads")
private Set<Workflow> workflows = new HashSet<>();
// Setters and getters...
}
DataUpload repository
#Repository
public interface DataUploadsRepository extends JpaRepository<DataUpload, UUID> {
#Transactional
void delete(DataUpload dataUpload);
Optional<DataUpload> findByDataUploadId(UUID dataUploadId);
}
To delete data uploads, I am trying to execute a couple of query methods of the repository:
First version
dataUploadsRepository.deleteAll(workflow.getDataUploads());
Second version
workflow.getDataUploads().stream()
.map(DataUpload::getDataUploadId)
.map(dataUploadsRepository::findByDataUploadId)
.filter(Optional::isPresent)
.map(Optional::get)
.forEach(dataUploadsRepository::delete);
Problem is that Spring Data JPA is not removing DataUploads nor entries of the association table workflow_data.
How can I tell Spring Data to remove from both data_uploads and workflow_data (association table)?
I would appreciate any help.

I found the solution for this problem. Basically, both entities (in my case) need to be the owner of the relationship and the data from the association table must be deleted first.
Workflow entity (relationship owner)
#Entity(name = "workflows")
public class Workflow {
#Id
#Column(name = "workflow_id", updatable = false, nullable = false)
#GeneratedValue(strategy = GenerationType.AUTO)
private UUID workflowId;
#ManyToMany(cascade = { CascadeType.ALL })
#JoinTable(name = "workflow_data",
joinColumns = #JoinColumn(name = "workflow_id", referencedColumnName = "workflow_id"),
inverseJoinColumns = #JoinColumn(name = "data_upload_id", referencedColumnName = "data_upload_id"))
private Set<DataUpload> dataUploads = new HashSet<>();
// Setters and getters...
}
DataUpload entity (relationship owner)
#Entity(name = "data_uploads")
public class DataUpload {
#Id
#Column(name = "data_upload_id")
private UUID dataUploadId;
#ManyToMany
#JoinTable(name = "workflow_data",
joinColumns = #JoinColumn(name = "data_upload_id", referencedColumnName = "data_upload_id"),
inverseJoinColumns = #JoinColumn(name = "workflow_id", referencedColumnName = "workflow_id"))
private Set<Workflow> workflows = new HashSet<>();
// Setters and getters...
}
Notice that Workflow has ALL as cascade type, since (based on the logic I need), I want Spring Data JPA to remove, merge, refresh, persist and detach DataUploads when modifying workflows. On the other hand, DataUpload does not have cascade type, as I do not want Workflow instances (and records) to be affected due to DataUploads deletions.
In order to successfully delete DataUploads, the associate data should be deleted first:
public void deleteDataUploads(Workflow workflow) {
for (Iterator<DataUpload> dataUploadIterator = workflow.getDataUploads().iterator(); dataUploadIterator.hasNext();) {
DataUpload dataUploadEntry = dataUploadIterator.next();
dataUploadIterator.remove();
dataUploadsRepository.delete(dataUploadEntry);
}
}
dataUploadIterator.remove() deletes records from the association table (workflow_data) and then the DataUpload is deleted with dataUploadRepository.delete(dataUploadEntry);.

It has been a while since I have to fix these kind of mappings so I'm not going to give you a code fix, instead maybe give you another perspective.
First some questions like, do you really need a many to many? does it make sense that any of those entities exist without the other one? Can a DataUpload exist by itself?
In these mappings you are supposed to unassign the relationships on both sides, and keep in mind that you could always execute a query to remove the actual values (a query against the entity and the intermediate as well)
A couple of links that I hope can be useful to you, they explain the mappings best practices and different ways to do the deletion.
Delete Many, Delete Many to Many, Best way to use many to many.

Self-referencing ManyToMany relations in Hibernate

I am looking for a way to model a relation between two or more objects of the same class in Hibernate. For example: I want to create a situation where I want to model relations between persons. In the database, I have two tables:
Person:
Id
Name
Relation:
Parent Id
Child Id
I tried to model this in Hibernate as a Person have two ManyToMany relations (Getter/Setter annotations are Lombok):
#Getter
#Setter
#Builder
#AllArgsConstructor
#NoArgsConstructor
#Entity
#Table(name = "persons")
public class Person {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Long Id
#Column(name="name")
private String name;
#ManyToMany(fetch = FetchType.LAZY)
#JoinTable(
name = "relations",
joinColumns = {#JoinColumn(name = "parent_id", nullable = false)},
inverseJoinColumns = {#JoinColumn(name = "child_id", nullable = false)}
)
private Set<Person> children = new HashSet<>();
#ManyToMany(fetch = FetchType.LAZY)
#JoinTable(
name = "relations",
joinColumns = {#JoinColumn(name = "child_id", nullable = false)},
inverseJoinColumns = {#JoinColumn(name = "parent_id", nullable = false)}
)
private Set<Person> parents = new HashSet<>();
}
This gave me the following problems:
With fetch type "LAZY" Hibernate complains about not having a Session when calling person.getChildren() or person.getParents()
With fetch type "EAGER" Hibernate returns null for the sets, which causes nullpointers when trying to add children or parents. The other thing I am worried about is that possible endless recursive eager fetching.
To get around this, I've tried to model the Relation as a class, so that I can use JPA queries in the PersonRepository to find Children and Parents without having to mess with the intricacies of ManyToMany :
public interface PersonRepository extends JpaRepository<Person, Long> {
#Query(
"select p from Person p join Relation r on p.id = r.childId where r.childId = :childId"
)
List<Person> findParents(Long childId);
}
This caused Hibernate to complain that Relation does not have an #Id field. I do not want to add that because the relation table in the database should model the relation, and not have an id of its own.
When trying to search online for similar structures in Hibernate I usually find classic many-to-many examples in the documentation and questions like this one where there is a relation between two different classes instead of a relation between two objects of the same class.
I'm looking for a way to model a relation between two or more objects of the same class in Hibernate. I want to be able to fetch relations lazily, or fetch them afterwards for performance reasons.
It would be nice if the relation table could have an extra field "type" which indicates the type of relation between Persons (child, parent, nephew, friend) so that there is room for new relation types without too much changes to database and code.

Not sure I understand you correctly, but I had a similar case once.
What I did was to persist the child/parent without its relations and updated them with their relations afterwards (still in the same transaction).
private void insertEntity(final AbstractEntity entity) {
insertedEntities.add(entity.getId());
final List<AbstractEntity> relations = entity.onBeforeInsertion();
for (final AbstractEntity relation : relations) {
if (!insertedEntities.contains(relation.getId())) {
insertEntity(relation);
}
}
final RelationBundle relationBundle = new RelationBundle();
entity.onInsertion(relationBundle);
immediatelySaveNewEntityThroughProxy(entity);
for (final AbstractEntity relation : relations) {
entity.onRelationInsertion(relation, relationBundle);
}
dao.saveOrUpdate(entity);
}
private void immediatelySaveNewEntityThroughProxy(final DocumentEntity entity) {
proxiedAccess().immediatelySaveNewEntity(entity);
}
private MyConsumer proxiedAccess() {
return applicationContext.getBean(getClass());
}
#Transactional(propagation = Propagation.REQUIRES_NEW)
public void immediatelySaveNewEntity(final DocumentEntity entity) {
try {
if (!dao.entityExistsFromId((int) entity.getId())) {
dao.save(entity);
}
} catch (final Exception e) {
LOGGER.error("Error saving entity: {}", entity.getId(), e);
}
}

Load lazy collection in ManyToMany relationship

I would like to query all products for a company. The products should be loaded with the list of countries. I managed to write a Spring JPA repository method to query what I want but I wonder why I need a DISTINCT clause.
If I run the following query, I get one product per country. So if a product has 3 countries, the query will return the same 3 rows. Can you explain why?
#EntityGraph(attributePaths = "countries", type = EntityGraph.EntityGraphType.LOAD)
List<Product> findByCompanyIdOrderByIdAsc(Long companyId);
So to fix that issue I added a Distinct clause which return what I want.
#EntityGraph(attributePaths = "countries", type = EntityGraph.EntityGraphType.LOAD)
List<Product> findDistinctByCompanyIdOrderByIdAsc(Long companyId);
I have the same issue if I run a JPQL Select p from Product LEFT JOIN FETCH p.countries WHERE p.company.id = ?1 which is equivalent to findByCompanyIdOrderByIdAsc.
The entities:
public class Product implements Serializable {
#ManyToOne
#JsonIgnore
private Company company;
#ManyToMany
#Cache(usage = CacheConcurrencyStrategy.NONSTRICT_READ_WRITE)
#JoinTable(name = "product_country",
joinColumns = #JoinColumn(name="product_id", referencedColumnName="id"),
inverseJoinColumns = #JoinColumn(name="country_id", referencedColumnName="id"))
#JsonIdentityInfo(generator = ObjectIdGenerators.PropertyGenerator.class, property = "id",
resolver = EntityIdResolver.class, scope = Country.class)
#JsonIdentityReference(alwaysAsId = true)
private Set<Country> countries = new HashSet<>();
}
public class Country implements Serializable {
}

Hibernate ManyToMany results in cartesian product

I am a newbie in hibernate/JPA and apologies for this noob question. I have follow relations with in entities
Manager has many to many relation with Worker
Worker has many to many relation with Task
Following is the ERD
Following are my java classes
#Entity
#table(name="manager")
public class Manager {
private long id;
private String name;
#ManyToMany(fetch = FetchType.EAGER)
#JoinTable(name = "manager_worker", joinColumns = { #JoinColumn(name = "manager_id") },
inverseJoinColumns = { #JoinColumn(name = "worker_id") })
private ArrayList<Worker> workers = new ArrayList<Worker>();
// ......getter setter
}
#Entity
#Table(name = "worker")
public class Worker {
#Id
private long id;
private String name;
#ManyToMany(fetch = FetchType.EAGER)
#JoinTable(name = "worker_task", joinColumns = { #JoinColumn(name = "worker_id") },
inverseJoinColumns = { #JoinColumn(name = "task_id") })
private ArrayList<Task> tasks = new ArrayList<Task>();
...........................
}
#Entity
#Table(name = "task")
public class Task {
#Id
private long id;
private String title;
.................
}
I have following data:
Manager M1 has worker W1 and W2
W1 has tasks TW1, TW2, TW3
W2 has tasks TW2 and TW2
When I get Manager object for Id M1, the result has cartesian product of Worker and Task i.e. W1 data is repeated for 3 times and W2 data is repeated for 2 times i.e. Manager.worker array list have 5 worker object instead of 2.
To load the data I am using Session.get() method
public E find(final K key) {
return (E) currentSession().get(daoType, key);
}
Can anyone please tell me how I can fix this and point me to any best practices that I should use in this case.
Thanks

This happens because both #ManyToMany relationships are eager loaded.
This will cause an outer join over all three tables. The result of this query are five rows where W1 appear three times and W2 two times. So far everything is correct.
But for some reason Hibernate just puts all workers in the collections even if they are duplicates. It's even described in the FAQ.
There are multiple options to resolve this issue:
change one of the relationships to lazy loading
use a Set instead of a List (you shouldn't use ArrayList as type for variables anyway especially when using Hibernate)
use #Fetch(FetchMode.SELECT) for tasks

Where is your targetEntity in the #ManyToMany relationship?
you should have something like:
#ManyToMany(targetEntity = Task.class)
#JoinTable(name = "worker_task", joinColumns = { #JoinColumn(name = "worker_id") },
inverseJoinColumns = { #JoinColumn(name = "task_id") })
private ArrayList<Task> tasks = new ArrayList<Task>();
I suggest you to change to #OneToMany and #ManyToOne relationship anyways. That concept is more compatible with the database design and is more understandable when a person looks at the ERD. So Manager has one-to-many relationship with Manager_Worker thus manager_worker has many-to-one relationship with manager. Keep the same for the rest of the entities.

Optional Many to Many relationship in Hibernate

I want to create M:N relationship as below
Each user can have zero or many ebooks
Each ebook must belongs to one or many users
My mappings in Hibernate :
User.java
#Entity
#Table(name = "USERS")
public class User {
//...
#ManyToMany(cascade = CascadeType.ALL, fetch = FetchType.EAGER)
#JoinTable(name = "USER_EBOOK", joinColumns = #JoinColumn(name = "USER_ID", nullable = false),
inverseJoinColumns = #JoinColumn(name = "EBOOK_ID", nullable = false))
private List<Ebook> listOfEbooks = new ArrayList<Ebook>();
//...
}
Ebook.java
#Entity
#Table(name="EBOOK")
public class Ebook {
//...
#ManyToMany(mappedBy = "listOfEbooks", fetch = FetchType.EAGER)
#NotFound(action = NotFoundAction.EXCEPTION)
private List<User> listOfEbookUsers = new ArrayList<User>();
//...
}
How can I add this additional constraints for example one or many - zero or many?, when I save only ebook object to database there is ebook that does not belongs to anyone.

See this question and the answer of the thread:
Mapping a bidirectional list with Hibernate
And see also this tutorial:
http://viralpatel.net/blogs/hibernate-many-to-many-annotation-mapping-tutorial/
The tutorial provides very good examples of how to implement a proper Many-to-Many mapping.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Optimally handling many-to-many relation with huge data sets - java

Related

Spring Data JPA - Delete many to many entries

Self-referencing ManyToMany relations in Hibernate

Load lazy collection in ManyToMany relationship

Hibernate ManyToMany results in cartesian product

Optional Many to Many relationship in Hibernate

Categories

Resources