Many-to-many relationship in Objectify with strong consistency - java

Being new to Google Cloud Datastore I would like to make sure that I am on a right track.
What I need:
many-to-many relationship
the relationship itself has to hold data describing the relation
strong consistency is required both ways:
from user entity to all the data entities that this user has permissions to
from data entity to all the users that have permission to it
This is what I came up with:
#Entity
public class User {
#Id String userId;
}
#Entity
public class PermissionIndex {
#Id long id;
#Parent Key<User> user;
List<Ref<Permission>> permissions;
}
#Entity
public class Permission {
#Id long id;
boolean writePermission;
boolean managePermission;
#Load Ref<Data> data; //So that Data entities can be retrieved with strong
//consistency after looking up Permission entities
//for a specific User entity
#Load Ref<User> user; //So that User entities can be retrieved with strong
//consistency after looking up Permission entities
//for a specific Data entity
}
#Entity
public class DataIndex {
#Id long id;
#Parent Key<Data> data;
List<Ref<Permission>> permissions;
}
#Enti.
public class Data {
#Id String dataId;
#Id String actualData;
}
If I understand right with this implementation the only way to filter Data Entities of a specific user is to get all Permission entities and filter Data entities in memory, am I right?
Is there a better way to implement it still fulfilling the requirements?
UPDATE:
In my understanding this implementation will allow me to implement logic that will retrieve data assuring strong consistency (having user id - ancestor query to retrieve all Permission entities and then using get_by_key to retrieve the Data entities).
I am wondering if I should approach it in a different way - since I do not have a lot of expierience with datastore/objectify.

There's an important conceptual misunderstanding inherent to the question: Relationships are not strongly or eventually consistent. Queries are.
If you perform a get-by-key operation, the result will be strongly consistent. If you perform a non-ancestor filter query, the result will eventually consistent. Rephrasing this:
If you navigate your object graph using get-by-key operations, you will see strong consistency. If you navigate your object graph using non-ancestor query filters, you will see eventual consistency.
If you need strong consistency, structure your data so that your queries can be satisfied with get-by-key operations or with ancestor queries.

Your logic seems sound... but for a relational database.
This kind of logic doesn't hold true in the HRD that is the datastore. There are obviously ways to go around this, and you have figured them out with the way you described.
For consistency, your only chance is to use ancestor queries. The datastore is eventually consistent, only with "get_by_key" or with an ancestor query can you "force" consistency.
If you want something closer to SQL, maybe consider cloud GQL?

Related

Hibernate #ManyToOne/#JoinColumn optimization

I have a Hibernate entity that is comprised of many other entities that are used within the application. The other entities that make up this MainEntity are joined by using #ManyToOne and #JoinColumn. This MainEntity class has 5 columns (#Column) and 7 #ManyToOne/#JoinColumn entities that are used.
I seem to be running into performance issues when retrieving all of these MainEntity classes. We want to serialize the MainEntity to JSON as well as the other entities that are associated with it. Note that there aren't that many that we are retrieving - less than 30 total.
Below is an example of what the class looks like along with my findAll() method to retrieve these classes. I know that #ManyToOne is EAGER by default, so I'm wondering if there's a better way to get all of these entities that is easier on the system. Thank you in advance.
#Entity(name = "MainEntity")
#Table(name = "main_entity")
public class MainEntity {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(name = "id")
private Integer id;
// Other #Columns defined here
#ManyToOne()
#JoinColumn(name = "entity_1_id")
private Entity1 entity1;
#ManyToOne()
#JoinColumn(name = "entity_2_id")
private Entity2 entity2;
#ManyToOne()
#JoinColumn(name = "entity_3_id")
private Entity3 entity3;
// ... and so on, for a total of 7 #ManyToOne() columns
}
Here is the findAll() method that I have:
final List<E> findAllOrdered(Class<E> clazz, Order order) {
final Session session = sessionManager.openNewSession();
try {
return session.createCriteria(clazz)
.addOrder(order)
.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
.list();
} finally {
sessionManager.closeSession(session);
}
}
I found myself having to add the Criteria.DISTINCT_ROOT_ENTITY because we were getting duplicate MainEntity results if a child had multiple associated with it. I suspect this is big part of my performance problem.
If you are retrieving unwanted response and if you want to filter then you may use #JsonIgnore
eg:
#ManyToOne()
#JoinColumn(name = "entity_1_id")
#JsonIgnore
private Entity1 entity1;
Few pointers to consider:
Consider making associations Lazy by default unless you really want to load all the association data and its associations along the parent.
Use JOIN in HQL/criteria based on which association we really want to fetch and the depth of associations.
Or use EntityGraph to decide which associations to be fetch.
Enable show_sql as this show the number of SQLs and the exact SQLs that are getting fired to the DB. This would be a good starting point and subsequently you can tune you associations to LAZY/EAGER, SELECT/JOIN/SUBSELECT based on your use case.
You can run these queries against the DB and see if tuning the query/DB (indexes, partitioning etc) will help reduce the query times.
See if second level cache would help for your use case. Note that second level cache will come with its own complexity and extra overhead and especially if the data is of transactional type and not read-only mostly. With application deployed on nodes maintaining the cache coherence will be another aspect to think about. Need to validate if the extra overhead and complexity is really worth the efficiency outcome of the second level cache.
From an application design perspective, you can also consider and see if you really want to retrieve the MainEntity and the associations in a single request or UI. Instead we could first show the MainEntity with some paging and based on the selection we could fetch the associations for that MainEntity with paging.
Note that, this is not a complete list. But a good starting point and based on your use case you can see which one would fit for you and any other additional techniques.

Best practice of designing JPARepository(ies) for ORM Domain graph

I have been designing spring rest apis using standard MVC architecture like domain layer as POJOs and repositories to fetch domain data from db tables. So far these entities were isolated so the design acted as separate RestController, Service, and Repository flow for each entity.
I have been looking to understand the best practice when it comes to association in domain objects i.e., ORM. For example, lets take following pseudocode to illustrate domain classes (only for the purpose to express the design in question. I have not provided complete classes):
public class Customer {
#Column
private int id;
#Column;
private String name;
#OneToMany
private List<Order> orders;
//...getters setters
}
public class Order {
#Column
private int id;
#Column;
private String orderNumber;
#OneToMany
private List<Product> products;
#ManyToOne
private Customer customer;
//...getters setters
}
public class Product {
#Column
private int id;
#Column;
private String productName;
#ManyToOne
private Order order;
//...getters setters
}
The dilemma I have from designing perspective. I have following approaches that very well all be incorrect:
Define one RestController for customer and provide all the api resources like /customers, /customers/id/orders, /customers/id/orders/id/products etc. Have one Service that takes care of working with these domains. Have separate JPARepository for EACH domain. The "keep it simple" thing here is that I have separete repository for each domain so I just have to provide query methods in corresponding Repository class in order to find details for a specific domain i.e., fetch orders for a given customer Id. However, that makes me think killing the purpose of using ORM model because I am fetching individual domains through their Repository classes. This option will make all 3 repository classes wired in the service class and that also I think is not a good design. 3 might looks okay here but I have 6 to 7 domains in the ORM graph in my actual requirements so that would mean autowiring 6 repositoris in one service class.
One RestController and one Service class as in above option but the Repository class is single too. The Repository is created only for Customer domain. In this way I retrieve Customers with other domaims lazy loaded. This is to fulfil a GET request of "/customers". To fulfil a GET request of "/customers/id/orders" I will again use Customer Repository, retrieve customer for the given Id and then return list of Orders. Further, for a GET request of "/customers/id/orders/id/products" , I will require writing a manual data fetching mechanism in Customer domain so that it takes care of retrieving list of products for a given customerId and orderId. This way I use one Repository, satisfying the purpose of using ORM but then adding manual fetching data methods in Customer domain. Another negative I see is that I need to get complete list of orders in a customer domain even if I have customerId and orderId available. I would have fetched one single order based on customerId and orderId has I used a separate repository for Order.
Both are incorrect and there exists a better approach.
I have looked through spring docs for repository and hibernate docs for ORM. I went through multiple tutorials for one-to-many mappings with spring data rest but I found mixed approaches in different tutorials.
This question will look duplicate to you as I have read multiple posts on stackoverflow regarding this design concern but none of the answers give me a reasoning for the trade offs and options I mentioned above. Hence, I am reposting this question.
It is a mixed approach. e.g. in your case the product entity need not have a #ManyToOne relation with the Order. Imagine if your product is part of 1 million orders! How many time will you query a product to find orders? You will query findOrdersByProduct(Product) rather than findProductByOrder(Order)
think w.r.t your usecase. Sometimes it makes sense to have one directional mapping if you will never fetch the information other than from the owner of the relationship
Think about the amount of data that you will fetch (including the joins) if you query an entity.
e.g if i am fetching an organization do i need to fetch all its employees? your system will go for a toss (lazy loading will save you most of the time but if you have an Angular then it will bind and fetch the entire model). But it does make sense to have many to one relationship with an org from the employee entity.

Play 2 framework many-to-many relation better design

In my app I have different Users and Items, so each user can pick many items.
In the tutorial I have learned about #ManyToMany annotation.
#Entity
public class Item extends Model {
...
#ManyToMany(cascade = CascadeType.REMOVE)
public List<User> users = new ArrayList<User>();
But second option I can think of is to define a separate class for User-to-Item relation so I can add additional information like date and time.
#Entity
public class ItemUserRel extends Model {
#Id
public Long id;
public User user;
public Item item;
//additional information
public Date date;
...
Which of both options is better design and why?
I faced a similar issue a while ago. I also had to deal with a model User and the model Group. My requirements were:
A user can have n readable and n writable Groups. These permissions must be stored in a third table (not in User and not in Group table). But also additional properties like authorisedBy and 'authorisedOn'. So #ManyToMany did not worked for, because I had no real control of it. Also the additional properties makes it hard to map via JPA.
Perhaps other designs are possible but I (still) think that introducing a new class UserGroup would be best. This class has #ManyToOne relation to a single User.
I end up defining these three models:
User
Group - General information about the group model
UserGroup - Containing additional fields like: permissions, authorisedBy, authorisedOn etc.
On my User model, I would have getter getUserGroups() but also getPersonalGroup() which is basically one (personal) instance of Group in getUserGroups() but where the createdBy and authorisedBy is the same user.
I found this design much more maintainable by me and more clear. Also this design helped me to create a comfortable user interface, where the administrator can manage and change permissions for UserGroups.
Perhaps more useful information
Mapping many-to-many association table with extra column(s)
How Do I Create Many to Many Hibernate Mapping for Additional Property from the Join Table?

Hibernate: bidirectionality vs. unidirectionality, depending on relationship

Reading a wiki page about Hibernate I elaborated some perplexing conclusions:
1) Bidirectionality is reccomended in one-to-many
2) Bidirectionality is optional in many-to-one
3) Bidirectionality is normally present in many-to-many
4) Unidirectionality is reccomended in one-to-one relationships,
using as owner class the one with the primary key of the
relation (not the foreign key).
Are these statements true? Do you have any example to explain why in some cases unidirectionality is reccomended and in others bidirectionality is reccomended instead?
Here's the wiki page (read under "concepts"):
http://wiki.elvanor.net/index.php/Hibernate
Note that "bidirectionality" in the context of Hibernate means that in your Java classes, both sides of the relationship maintain a link to the other side. It has no impact on the underlying database schema (except in the case of indexed collections, see below), it's just whether or not you want the Java side to reflect that.
For all of your conclusions, "recommended" actually translates to "it usually ends up making sense, given your business logic, that you'd do it this way".
You really want to read through chapters 7 and 8 of the Hibernate Core Reference Manual.
It's recommended if you need it. A lot of convenience comes from specifying a bidirectional relationship; particularly it becomes possible to navigate the relationship from both ends in your business logic. However, if you don't actually need to do this, there's nothing to gain. Use whatever is most appropriate for the situation. In practice I've found that I want to specify both ends of the relationship to Hibernate more often than not -- but it is not a rule, rather, it reflects what I want to accomplish.
This is true. In a many-to-one (or one-to-many) relationship, it is optional. Consider the following schema:
table: users
fields: userId, userName
table: forumPosts
fields: postId, userId, content
Where forumPosts.userId is a foreign key into users. Your DAO classes might be (getters/setters omitted for brevity):
public class User {
private long userId;
private String userName;
}
public class ForumPost {
private long postId;
private User user;
private String content;
}
As you can see, this is a unidirectional many-to-one relationship (ForumPost-to-User). The ForumPost links to the user, but the User does not contain a list of ForumPosts.
You could then add a one-to-many mapping to User to make it have a list of ForumPosts. If you use a non-indexed collection like a set, this has no impact on the database schema. Merely by specifying both sides to Hibernate, you have made it bidirectional (using exactly the same schema as above), e.g.:
public class User {
private long userId;
private String userName;
private Set<ForumPost> forumPosts;
}
public class ForumPost {
private long postId;
private User user;
private String content;
}
Hibernate will now populate User.forumPosts when necessary (essentially with SELECT * FROM forumPosts WHERE userId = ?). The only difference between bidirectional and unidirectional here is that in one case Hibernate fills a set of ForumPosts in User, and in the other case it doesn't. If you ever have to get a collection of any given user's posts, you will want to use a bidirectional relationship like this rather than explicitly constructing an HQL query. Depending on your inverse/insert/update/cascade options in your relationship, you can also add and remove posts by modifying the User's set of posts, which may be a more accurate reflection of your business logic (or not!).
The reason I specified that non-indexed collections don't impact the underlying schema is because if you want to use an ordered, indexed collection like a list, you do have to add an extra list index field to the forumPosts table (although you do not have to add it to the ForumPost DAO class).
This is true, but is not a requirement and it's deeper than that. Same as above. Bidirectionality is usually present in many-to-many. Many-to-many relationships are implemented with a third join table. You specify the details of this table on both sides of the relationship. You can simply not specify the relationship on one side, and now it's a unidirectional relationship. Again, whether or not you tell Hibernate about the mapping is what determines if its unidirectional or bidirectional (in the context of Hibernate). In this case it also has no impact on the underlying schema unless you are using an ordered index collection. In fact, the many-to-many example in the Hibernate reference manual is a unidirectional setup.
In reality, it would be odd to have a unidirectional many-to-many relationship, unless perhaps you are working with an existing database schema and your particular application's business logic has no need for one of the sides of the relationship. Usually, though, when you've decided you need a many-to-many relationship, you've decided that because you need to maintain a collection of references on both sides of the relationship, and your DAO classes would reflect that need.
So the correct conclusion here is not merely that "bidirectionality is normally present in many-to-many", but instead "if you've designed a database with a join table, but your business logic only uses a unidirectional relationship, you should question whether or not your schema is appropriate for your application (and it very well may be)".
This is not true. Exactly the same as all the points above. If you need to navigate the one-to-one relationship from both sides, then you'd want to make it bidirectional (specify both sides of the mapping to Hibernate). If not, then you make it unidirectional (don't specify both sides of the mapping to Hibernate). This again comes down to what makes sense in your business layer.
I hope that helps. I left a lot of intricacies out. You really should read through the Hibernate documentation - it is not organized particularly well but Chapter 7 and 8 will tell you everything you need to know about collection mapping.
When I'm designing an application and a database from scratch, personally, I try to forget about Hibernate and the database entirely. I set up my DAOs in a way that makes sense for my business requirements, design a database schema to match, then set up the Hibernate mappings, making any final tweaks to the schema (e.g. adding index fields for ordered collections) at that point if necessary.

What is the difference between Unidirectional and Bidirectional JPA and Hibernate associations?

What is the difference between Unidirectional and Bidirectional associations?
Since the table generated in the db are all the same,so the only difference I found is that each side of the bidiretional assocations will have a refer to the other,and the unidirectional not.
This is a Unidirectional association
public class User {
private int id;
private String name;
#ManyToOne
#JoinColumn(
name = "groupId")
private Group group;
}
public class Group {
private int id;
private String name;
}
The Bidirectional association
public class User {
private int id;
private String name;
#ManyToOne
#JoinColumn(
name = "groupId")
private Group group;
}
public class Group {
private int id;
private String name;
#OneToMany(mappedBy="group")
private List<User> users;
}
The difference is whether the group holds a reference of the user.
So I wonder if this is the only difference? which is recommended?
The main difference is that bidirectional relationship provides navigational access in both directions, so that you can access the other side without explicit queries. Also it allows you to apply cascading options to both directions.
Note that navigational access is not always good, especially for "one-to-very-many" and "many-to-very-many" relationships. Imagine a Group that contains thousands of Users:
How would you access them? With so many Users, you usually need to apply some filtering and/or pagination, so that you need to execute a query anyway (unless you use collection filtering, which looks like a hack for me). Some developers may tend to apply filtering in memory in such cases, which is obviously not good for performance. Note that having such a relationship can encourage this kind of developers to use it without considering performance implications.
How would you add new Users to the Group? Fortunately, Hibernate looks at the owning side of relationship when persisting it, so you can only set User.group. However, if you want to keep objects in memory consistent, you also need to add User to Group.users. But it would make Hibernate to fetch all elements of Group.users from the database!
So, I can't agree with the recommendation from the Best Practices. You need to design bidirectional relationships carefully, considering use cases (do you need navigational access in both directions?) and possible performance implications.
See also:
Deterring “ToMany” Relationships in JPA models
Hibernate mapped collections performance problems
There are two main differences.
Accessing the association sides
The first one is related to how you will access the relationship. For a unidirectional association, you can navigate the association from one end only.
So, for a unidirectional #ManyToOne association, it means you can only access the relationship from the child side where the foreign key resides.
If you have a unidirectional #OneToMany association, it means you can only access the relationship from the parent side which manages the foreign key.
For the bidirectional #OneToMany association, you can navigate the association in both ways, either from the parent or from the child side.
You also need to use add/remove utility methods for bidirectional associations to make sure that both sides are properly synchronized.
Performance
The second aspect is related to performance.
For #OneToMany, unidirectional associations don't perform as well as bidirectional ones.
For #OneToOne, a bidirectional association will cause the parent to be fetched eagerly if Hibernate cannot tell whether the Proxy should be assigned or a null value.
For #ManyToMany, the collection type makes quite a difference as Sets perform better than Lists.
I'm not 100% sure this is the only difference, but it is the main difference. It is also recommended to have bi-directional associations by the Hibernate docs:
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/best-practices.html
Specifically:
Prefer bidirectional associations:
Unidirectional associations are more difficult to query. In a large
application, almost all associations
must be navigable in both directions
in queries.
I personally have a slight problem with this blanket recommendation -- it seems to me there are cases where a child doesn't have any practical reason to know about its parent (e.g., why does an order item need to know about the order it is associated with?), but I do see value in it a reasonable portion of the time as well. And since the bi-directionality doesn't really hurt anything, I don't find it too objectionable to adhere to.
In terms of coding, a bidirectional relationship is more complex to implement because the application is responsible for keeping both sides in synch according to JPA specification 5 (on page 42). Unfortunately the example given in the specification does not give more details, so it does not give an idea of the level of complexity.
When not using a second level cache it is usually not a problem to do not have the relationship methods correctly implemented because the instances get discarded at the end of the transaction.
When using second level cache, if anything gets corrupted because of wrongly implemented relationship handling methods, this means that other transactions will also see the corrupted elements (the second level cache is global).
A correctly implemented bi-directional relationship can make queries and the code simpler, but should not be used if it does not really make sense in terms of business logic.

Categories