Persisting Ordered Domain Objects

Persisting Ordered Domain Objects - java

A very common use case in many web applications is that domain objects can be ordered by a user in the web interface - and that order is persisted for later; but I've noticed that every time I need to implement this, I always end up coming up with a solution that is different and adds considerable weight and complexity to a simple domain object. As an example, suppose I have the persistent entity Person
Person {
long id
String name
}
A user goes to /myapp/persons and sees all people in the system in the order in which they receive compensation (or something) - all of the people can be clicked and dragged and dropped into another position - when the page is loaded again, the order is remembered. The problem is that relational databases just don't seem to have a good way of doing this; nor do ORMs (hibernate is what I use)
I've been thinking of a generic ordering methodology - but there will be some overhead as the data would be persisted separately which could slow down access in some use cases. My question is: has anyone come up with a really good way to model the order of persistent domain objects?
I work in JavaEE with Hibernate or JDBCTemplate - so any code examples would be most useful with those technologies as their basis. Or any conceptual ideas would be welcome too.
UPDATE: I'm not sure where I went wrong, but it seems I was unclear as most have responded with responses that don't really match my question (as it is in my head). The problem is not how do I order rows or columns when I fetch them - it is that the order of the domain objects change - someone clicks and drags a "person" from the bottom of the list to the top of the list - they refresh the page and the list is now in the order they specified.

When fetching results, just build HQL queries with a different ORDER BY clause, depending on what the user has used last time

The problem is that relational databases just don't seem to have a good way of doing this; nor do ORMs (hibernate is what I use)
I'm not sure where you would get this impression. Hibernate specifically has support for mapping indexed collections (which is a "list" by another name), which usually boils down to storing a "list-index" column in the table holding the collection of items.
An example taken directly from the manual:
<list name="carComponents"
table="CarComponents">
<key column="carId"/>
<list-index column="sortOrder"/>
<composite-element class="CarComponent">
<property name="price"/>
<property name="type"/>
<property name="serialNumber" column="serialNum"/>
</composite-element>
</list>
This would allow a List<CarComponents> to be associated with your root entity, stored in the CarComponents table with a sortOrder column.

One possible generic solution:
Create a table to persist sort information, simplest case would be be one sortable field per entity with a direction:
table 'sorts'
* id: PK
* entity: String
* field: String
* direction: ASC/DESC enumeration (or ascending boolean flag)
It could be made more complicated by adding a userId to do per-user sorting or by adding a sort_items table with a foreign key to support sorting by multiple fields at a time.
Once you're persisting the sort information, it's a simple matter of adding Order instances to criteria (if that's what you're using) or concatenating order by statements to your HQL.
This also keeps your entities themselves free of and ordinal information, which in this case sounds like the right approach since the ordering is purely for user interaction purposes.
Update - Persisting entity order
Given the fact that you want to be able to reorder entities, not just define a sort for them, then you really do need to make an ordinal or index value part of the entity's definition.
The problem, as I'm sure you realize is the number of entities that would need to be updated, with the worst case scenario being moving the last entity to the top of the list.
You could use an increment value other than 1 (say 10) so you would have:
ordinal | name
10 | Crosby
20 | Stills
30 | Nash
40 | Young
Most of the time, updating the row would involve selecting two items and updating one. If I want to move Young to position 2, I select current item 2 and the previous item from the database to get the ordinals 10 and 20. Use these to create the new ordinal ((20 - 10) / 2 + 10 = 15). Now do a single update of Young with an ordinal of 15.
If you get to the point where division by two yields the same index as one of the entities you just loaded, that means it's time to spawn a task to normalize the ordinal values according to your original increment.

As far as I know, JPA 2.0 provides support for ordered lists:
https://secure.wikimedia.org/wikibooks/en/wiki/Java_Persistence/Relationships#Order_Column_.28JPA_2.0.29

I think that relational databases cannot do better than a dedicated ordering column.
The idea of "order" is not really defined in SQL for anything but cursors, and they are not a core relational concept but rather an implementation detail.
For all I know the only thing to do is to abstract the ordering column away with #OrderColumn (JPA2, so Hibernate 3.5+ compatibile).

Related

What is the DDD way to make sure that there is only one obj created with 2 attribute combinations

im pretty new to the whole DDD concept and i have the following question:
Lets say i have a UI where Users can save cars by putting in a id and a name. What is the DDD way to make sure that every unique id and name combination is only created once. The cars are all Entities and will be stored in a database. Usually i would just have put a primary and a foriegn key in a DB and just check if the combination is already there and if not create/store the obj and if there is the same combination then don´t.
Now i´m thinking if this is domain logic or just a simple CRUD. If it is domain logic and if i udnerstood correctly i should make my car object decide if it is valid or not. If thats the case how would i do that?
thanks in advance!
edit:
another thing: What if every created object should be deleted after 10 days. That would be a concept in the domain and would hence be also part of the domain logic. But how should the Object know when to delete itself and how should it do it? Would that be a domain service that checks the creation date of the objects and if it is older than 10 days it should perform a delete operation inside the DB?

I would go with a UNIQUE constraints on the 2 fields if you don't care about the validity of the values entered. That way even if someone, for some reasons, inserts/updates the records directly in the DB, the DB will prevent it.
If you care about the validity of the combined values entered, then you will have to add on top of that some logic in your code before saving it in the DB.
About your deletion mechanism, you can have a scheduler that check every day what are the data older than 10 days by checking a previously filled DB column (eg CREATED_ON) and delete them.

"It depends".
If id and name are immutable properties that are assigned at the beginning of the objects lifetime, then the straight forward thing to do is incorporate them into the key that you use to look up the aggregate.
car = Garage.get(id, name)
If instead what you have is a relation that changes over time (for instance, if you have to worry about name being corrupted by a data entry error) then things become more complicated.
The general term for the problem you are describing is set-validation. And the riddle is this: in order to reliably verify that a set has some property, you need to know that the property doesn't change between when you check it and when you commit your own change. In other words, you need to be able to lock the entire set.
Expressed more generally, the set is a collection of associated objects that we treat as a unit for the purpose of data changes. And we have a name for that pattern: aggregate.
So "the registry of names" becomes an aggregate in its own right - something that you can load, modify, store, and so on.
In some cases, it can make sense to partition that into smaller aggregates ("the set of things named Bob") - that reduces the amount of data you need to load/store when managing the aggregate itself, but adds some complexity to the use case when you change a name.
Is this "better" than the answer of just using database constraints? It depends on which side of the trade off you value more -- enforcing part of the domain invariant in the domain model and part of it in the data store adds complexity. Also, when you start leaning on the data store to enforce part of the invariant, you begin to limit your choices of what data store to use.

Hibernate: initialization of complex object

I have problems with full loading of very complex object from DB in a reasonable time and with reasonable count of queries.
My object has a lot of embedded entities, each entity has references to another entities, another entities references yet another and so on (So, the nesting level is 6)
So, I've created example to demonstrate what I want:
https://github.com/gladorange/hibernate-lazy-loading
I have User.
User has #OneToMany collections of favorite Oranges,Apples,Grapevines and Peaches. Each Grapevine has #OneToMany collection of Grapes. Each fruit is another entity with just one String field.
I'm creating user with 30 favorite fruits of each type and each grapevine has 10 grapes. So, totally I have 421 entity in DB - 30*4 fruits, 100*30 grapes and one user.
And what I want: I want to load them using no more than 6 SQL queries.
And each query shouldn't produce big result set (big is a result set with more that 200 records for that example).
My ideal solution will be the following:
6 requests. First request returns information about user and size of result set is 1.
Second request return information about Apples for this user and size of result set is 30.
Third, Fourth and Fifth requests returns the same, as second (with result set size = 30) but for Grapevines, Oranges and Peaches.
Sixth request returns Grape for ALL grapevines
This is very simple in SQL world, but I can't achieve such with JPA (Hibernate).
I tried following approaches:
Use fetch join, like from User u join fetch u.oranges .... This is awful. The result set is 30*30*30*30 and execution time is 10 seconds. Number of requests = 3. I tried it without grapes, with grapes you will get x10 size of result set.
Just use lazy loading. This is the best result in this example (with #Fetch=
SUBSELECT for grapes). But in that case that I need to manually iterate over each collection of elements. Also, subselect fetch is too global setting, so I would like to have something which could work on query level. Result set and time near ideal. 6 queries and 43 ms.
Loading with entity graph. The same as fetch join but it also make request for every grape to get it grapevine. However, result time is better (6 seconds), but still awful. Number of requests > 30.
I tried to cheat JPA with "manual" loading of entities in separate query. Like:
SELECT u FROM User where id=1;
SELECT a FROM Apple where a.user_id=1;
This is a little bit worse that lazy loading, since it requires two queries for each collection: first query to manual loading of entities (I have full control over this query, including loading associated entities), second query to lazy-load the same entities by Hibernate itself (This is executed automatically by Hibernate)
Execution time is 52, number of queries = 10 (1 for user, 1 for grape, 4*2 for each fruit collection)
Actually, "manual" solution in combination with SUBSELECT fetch allows me to use "simple" fetch joins to load necessary entities in one query (like #OneToOne entities) So I'm going to use it. But I don't like that I have to perform two queries to load collection.
Any suggestions?

I usually cover 99% of such use cases by using batch fetching for both entities and collections. If you process the fetched entities in the same transaction/session in which you read them, then there is nothing additionally that you need to do, just navigate to the associations needed by the processing logic and the generated queries will be very optimal. If you want to return the fetched entities as detached, then you initialize the associations manually:
User user = entityManager.find(User.class, userId);
Hibernate.initialize(user.getOranges());
Hibernate.initialize(user.getApples());
Hibernate.initialize(user.getGrapevines());
Hibernate.initialize(user.getPeaches());
user.getGrapevines().forEach(grapevine -> Hibernate.initialize(grapevine.getGrapes()));
Note that the last command will not actually execute a query for each grapevine, as multiple grapes collections (up to the specified #BatchSize) are initialized when you initialize the first one. You simply iterate all of them to make sure all are initialized.
This technique resembles your manual approach but is more efficient (queries are not repeated for each collection), and is more readable and maintainable in my opinion (you just call Hibernate.initialize instead of manually writing the same query that Hibernate generates automatically).

I'm going to suggest yet another option on how to lazily fetch collections of Grapes in Grapevine:
#OneToMany
#BatchSize(size = 30)
private List<Grape> grapes = new ArrayList<>();
Instead of doing a sub-select this one would use in (?, ?, etc) to fetch many collections of Grapes at once. Instead ? Grapevine IDs will be passed. This is opposed to querying 1 List<Grape> collection at a time.
That's just yet another technique to your arsenal.

I do not quite understand your demands here. It seems to me you want Hibernate to do something that it's not designed to do, and when it can't, you want a hack-solution that is far from optimal. Why not loosen the restrictions and get something that works? Why do you even have these restrictions in the first place?
Some general pointers:
When using Hibernate/JPA, you do not control the queries. You are not supposed to either (with a few exceptions). How many queries, the order they are executed in, etc, is pretty much beyond your control. If you want complete control of your queries, just skip JPA and use JDBC instead (Spring JDBC for instance.)
Understanding lazy-loading is key to making decisions in these type of situation. Lazy-loaded relations are not fetched when getting the owning entity, instead Hibernate goes back to the database and gets them when they are actually used. Which means that lazy-loading pays off if you don't use the attribute every time, but has a penalty the times you actually use it. (Fetch join is used for eager-fetching a lazy relation. Not really meant for use with regular load from the database.)
Query optimalization using Hibernate should not be your first line of action. Always start with your database. Is it modelled correctly, with primary keys and foreign keys, normal forms, etc? Do you have search indexes on proper places (typically on foreign keys)?
Testing for performance on a very limited dataset probably won't give the best results. There probably will be overhead with connections, etc, that will be larger than the time spent actually running the queries. Also, there might be random hickups that cost a few milliseconds, which will give a result that might be misleading.
Small tip from looking at your code: Never provide setters for collections in entities. If actually invoked within a transaction, Hibernate will throw an exception.
tryManualLoading probably does more than you think. First, it fetches the user (with lazy loading), then it fetches each of the fruits, then it fetches the fruits again through lazy-loading. (Unless Hibernate understands that the queries will be the same as when lazy loading.)
You don't actually have to loop through the entire collection in order to initiate lazy-loading. You can do this user.getOranges().size(), or Hibernate.initialize(user.getOranges()). For the grapevine you would have to iterate to initialize all the grapes though.
With proper database design, and lazy-loading in the correct places, there shouldn't be a need for anything other than:
em.find(User.class, userId);
And then maybe a join fetch query if a lazy load takes a lot of time.
In my experience, the most important factor for speeding up Hibernate is search indexes in the database.

Java - Google App Engine - modelling graph structures in Google Datastore

Google Apps Engine offers the Google Datastore as the only NoSQL database (I think it is based on BigTable).
In my application I have a social-like data structure and I want to model it as I would do in a graph database. My application must save heterogeneous objects (users,files,...) and relationships among them (such as user1 OWNS file2, user2 FOLLOWS user3, and so on).
I'm looking for a good way to model this typical situation, and I thought to two families of solutions:
List-based solutions: Any object contains a list of other related objects and the object presence in the list is itself the relationship (as Google said in the JDO part https://developers.google.com/appengine/docs/java/datastore/jdo/relationships).
Graph-based solution: Both nodes and relationships are objects. The objects exist independently from the relationships while each relationship contain a reference to the two (or more) connected objects.
What are strong and weak points of these two approaches?
About approach 1: This is the simpler approach one can think of, and it is also presented in the official documentation but:
Each directed relationship make the object record grow: are there any limitations on the number of the possible relationships given for instance by the object dimension limit?
Is that a JDO feature or also the datastore structure allows that approach to be naturally implemented?
The relationship search time will increase with the list, is this solution suitable for large (million) of relationships?
About approach 2: Each relationship can have a higher level of characterization (it is an object and it can have properties). And I think memory size is not a Google problem, but:
Each relationship requires its own record, so the search time for each related couple will increase as the total number of relationships increase. Is this suitable for large amount of relationships(millions, billions)? I.e. does Google have good tricks to search among records if they are well structured? Or I will be soon in a situation in which if I want to search a friend of User1 called User4 I have to wait seconds?
On the other side each object doesn't increase in dimension as new relationships are added.
Could you help me to find other important points on the two approaches in such a way to chose the best model?

First, the search time in the Datastore does not depend on the number of entities that you store, only on the number of entities that you retrieve. Therefore, if you need to find one relationship object out of a billion, it will take the same time as if you had just one object.
Second, the list approach has a serious limitation called "exploding indexes". You will have to index the property that contains a list to make it searchable. If you ever use a query that references more than just this property, you will run into this issue - google it to understand the implications.
Third, the list approach is much more expensive. Every time you add a new relationship, you will rewrite the entire entity at considerable writing cost. The reading costs will be higher too if you cannot use keys-only queries. With the object approach you can use keys-only queries to find relationships, and such queries are now free.
UPDATE:
If your relationships are directed, you may consider making Relationship entities children of User entities, and using an Object id as an id for a Relationship entity as well. Then your Relationship entity will have no properties at all, which is probably the most cost-efficient solution. You will be able to retrieve all objects owned by a user using keys-only ancestor queries.

I have an AppEngine application and I use both approaches. Which is better depends on two things: the practical limits of how many relationships there can be and how often the relationships change.
NOTE 1: My answer is based on experience with Objectify and heavy use of caching. Mileage may vary with other approaches.
NOTE 2: I've used the term 'id' instead of the proper DataStore term 'name' here. Name would have been confusing and id matches objectify terms better.
Consider users linked to the schools they've attended and vice versa. In this case, you would do both. Link the users to schools with a variation of the 'List' method. Store the list of school ids the user attended as a UserSchoolLinks entity with a different type/kind but with the same id as the user. For example, if the user's id = '6h30n' store a UserSchoolLinks object with id '6h30n'. Load this single entity by key lookup any time you need to get the list of schools for a user.
However, do not do the reverse for the users that attended a school. For that relationship, insert a link entity. Use a combination of the school's id and the user's id for the id of the link entity. Store both id's in the entity as separate properties. For example, the SchoolUserLink for user '6h30n' attending school 'g3g0a3' gets id 'g3g0a3~6h30n' and contains the fields: school=g3g0a3 and user=6h30n. Use a query on the school property to get all the SchoolUserLinks for a school.
Here's why:
Users will see their schools frequently but change them rarely. Using this approach, the user's schools will be cached and won't have to be fetched every time they hit their profile.
Since you will be getting the user's schools via a key lookup, you won't be using a query. Therefore, you won't have to deal with eventual consistency for the user's schools.
Schools may have many users that attended them. By storing this relationship as link entities, we avoid creating a huge single object.
The users that attended a school will change a lot. This way we don't have to write a single, large entity frequently.
By using the id of the User entity as the id for the UserSchoolLinks entity we can fetch the links knowing just the id of the user.
By combining the school id and the user id as the id for the SchoolUser link. We can do a key lookup to see if a user and school are linked. Once again, no need to worry about eventual consistency for that.
By including the user id as a property of the SchoolUserLink we don't need to parse the SchoolUserLink object to get the id of the user. We can also use this field to check consistency between both directions and have a fallback in case somehow people are attending hundreds of schools.
Downsides:
1. This approach violates the DRY principle. Seems like the least of evils here.
2. We still have to use a query to get the users who attended a school. That means dealing with eventual consistency.
Don't forget Update the UserSchoolLinks entity and add/remove the SchoolUserLink entity in a transaction.

You question is too complex but I try explain the best solution (I will answer in Python but same can be done in Java).
class User(db.User):
followers = db.StringListProperty()
Simple add follower.
user = User.get(key)
user.followers.append(str(followerKey))
This allow fast query who is followed and followers
User.all().filter('followers', followerKey) # -> followed
This query i/o costly so you can make it faster but more complicated and costly in i/o writes:
class User(db.User):
followers = db.StringListProperty()
follows = db.StringListProperty()
Whatever this is complicated during changes since delete of Users need update follows so you need 2 writes.
You can also store relationships but it is the worse scenario since it is more complex than second example with followers and follows ... - keep in mind than entity can have 1Mb it is not limit but can be.

Hibernate three tables many to many

I have a database with 3 tables. The main table is Contract, and it is joined with pairs of keys from two tables: Languages and Regions.
each pair is unique, but it is possible that one contract will have the following pair ids:
{ (1,1), (1,2), (2,1), (2,2) }
Today, the three tables are linked via a connecting entity called ContractLanguages. It contains a sequence id, and triplets of ids from the three tables.
However, in large enough contracts this causes a serious performance issue, as the hibernate environment creates a staggering amount of objects.
Therefore, we would like to remove this connecting entity, so that Contract will hold some collection of these pairs.
Our proposed solution: create an #embeddable class containing the Language and Region id's, and store them in the Contract entity.
The idea behind this is that there is a relatively small number of languages and regions.
We are assuming that hibernate manages a list of such pairs and does not create duplicates, therefore substantially reducing the amount of objects created.
However, we have the following questions:
Will this solution work? Will hibernate know to create the correct object?
Assuming the solution works (the link is created correctly), will hibernate optimize the object creation to stop creating duplicate objects?
If this solution does not work, how do we solve the problem mentioned above without a connecting entity?

From your post and comments I assume the following situation, please correct me if I'm wrong:
You have a limited set of Languages + Regions combinations (currently modelled as ContractLanguages entities)
You have a huge amount of Contract entities
Each contract can reference multiple Languages and Regions
You have problems loading all the contract languages because currently the combination consists of contract + language + region
Based on those assumptions, several possible optimizations come to my mind:
You could create a LanguageRegion entity which has a unique id and each contract references a set of those. That way you'd get one more table but Hibernate would just create one entity per LanguageRegion and load it once per session, even if multiple contracts would reference it. For that to work correctly you should employ lazy loading and maybe load those LanguageRegion entities into the first level cache before loading the contracts.
Alternatively you could just load columns that are needed, i.e. just load parts of an entity. You'd employ lazy loading as well but wouldn't access the contract languages directly but load them in a separate query, e.g. (names are guessed)
SELECT c.id, lang.id, lang.name, region.id, region.name FROM Contract c
JOIN c.contractlangues cl
JOIN cl.language lang
JOIN cl.region region
WHERE c.id in (:contractIds)
Then you load the contracts, get their ids, load the language and region details using that query (it returns a List<Object[]> with the object array containing the column values as selected. You put those into an appropriate data structure and access them as needed. That way you'd bypass entity creation and just get the data that is needed.

Should I iterate over hibernate collections to find an entity, or use criteria

What is the convention for this? Say for example I have the following, where an item bid can only be a bid on one item:
public class Item {
#OneToMany(mappedBy="item", nullable="false"
Set<ItemBid> itemBids = new HashSet<ItemBid>()
}
If I am given the name of the item bidder (which is stored in ItemBid) should I A) Load the club using a club dao and iterate over over the collection of it's itemBids until I find the one with the name I want, or B ) Create an ItemBid dao where the club and item bid name are used in criteria or HQL.
I would presume that B) would be the most efficient with very large collections, so would this be standard for retrieving very specific items from large collections? If so, can I have a general guideline as to what reasons I should be using the collections, and what time I should be using DAO's / Criteria?

Yes, you should definitely query bids directly. Here are the guidelines:
If you are searching for a specific bid, use query
If you need a subset of bids, use query
If you want to display all the bids for a given item - it depends. If the number of bids is reasonably small, fetch an item and use collection. Otherwise - query directly.
Of course from OO perspective you should always use a collection (preferably having findBy*() methods in Item accessing bids collection internally) - which is also more convenient. However if the number of bids per item is significant, the cost of (even lazy-) loading will be significant and you will soon run out of memory. This approach is also very wasteful.

You should be asking yourself this question much sooner: by the time you were doing the mapping. Mapping for ORM should be an intellectual work, not a matter of copying all the foreign keys onto attributes on both sides. (if only because of YAGNI, but there are many other good reasons)
Chances are, the bid-item mapping would be better as unidirectional (then again, maybe not).
In many cases we find that certain entities are strongly associated with an almost fixed number of some other entities (they would probably be called "aggregates" in DDD parlance). For example invoices and invoice items. Or a person and a list of his hobbies. Or a post and a set of tags for this post. We do not expect that the number of items in a given invoice will grow over time, nor will the number of tags. So they are all good places to map a #OneToMany. On the other hand, the number of invoices for each client will be growing - so we would just map an unidirectional #ManyToOne from client an invoice - and query.
Repositories (daos, whatever) that do queries are perfectly good OO (nothing wrong with a query; it is just an object describing your requirements in a storage-neutral way); using finders in entities - not so. From practical point of view it binds your entities to data access layer (DAOs or even JPA classes), and this will make them unusable in many use cases (GWT) or tricky to use when detached (you will have to guess which methods work outside session). From the philosophical point of view - it violates the single responsibility principle and changes your JPA entities into a sort of active record wannabe.
So, my answer would be:
if you need a single bid, query directly,
if you want to display all the bids for a given item - fetch an item and use the collection. This does not depend on the number of bids per item, as the query performed by JPA will be identical as a query you might perform yourself. If this approach needs tuning (like in a case where you need to fetch a lot of items and want to avoid the "N + 1 selects problem") then there is plenty of ways (join fetch, eager fetching, hints) to make it right, without changing the part of the code that uses getBids().
The simplest way to think about it is: if you think that some collection will never be displayed with paging (like tags on post, items on invoice, hobbies on person), map it with #OneToMany and access as a collection.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.