Storing parent-child relationship in MarkLogic - java

For MarkLogic (and maybe for noSQL in general?), is it best to store parent-child as one document? Thus, if coming from a relational world, a normalized parent-child table will need to be denormalized and stored as a single document?
Will this design impact how searches are done (since children records now are searched always in the context of the parent)?

It might depend whether children can have multiple parents or not (e.g. graph-type data, instead of hierarchical), but my reasoning would be that for hierarchical data, storing it in its natural hierarchical form (using XML or JSON or such), makes most sense. It doesn't mean storing the entire parent-child table as one document, but rather expanding the records to its original trees, and storing those as documents.
This will not fit all NoSQL solutions, but will work well for those that fall into the document store category, particularly if they provide good search around contents and hierarchy.. like MarkLogic..
Note: graph-type data can be stored as triples inside MarkLogic. That will allow querying it with SPARQL, and inferencing over it for instance..
HTH!

It's not that the parent-child relationship is "denormalized", but rather the children are "merged" into the parent.
One thing to consider is the type of relationship you have. UML provides descriptions for different kinds of relationships - see Difference between association, aggregation and composition .
In general (exceptions exist), I think association and aggregation relationships will be between separate documents, whereas composition relationships will be "merged" into a single document.
Concrete example - a person knows many persons (association), a person can own many vehicles (aggregation, a vehicle only has one owner, but its own lifecycle), and a person can have many names (composition). I would create person and vehicle documents, but not name documents - I would store all the names on the person document.
To me, that's a big advantage of a document database over a relational database. In the latter, I'm forced to create separate tables no matter what kind of relationship I have. In a document database, I can choose what makes the most sense and fits my application's needs. Very often, my physical document model much more closely resembles my application's conceptual model.

Related

JPA entity responsibility in domain driven design

I am wondering if JPA entity and DDD entity should be the same class?
I can see in examples that it is a common practice, but doesn't it violate single responsibility principle?
I believe this to be common practice due to a level of convenience but it's not particularly clean design. In most cases it's coincidental that the JPA entities match, or are at least close enough, to domain objects.
JPA is an abstraction to the persistent data layer and its core focus is providing an object to relational data mapping. The JPA entities therefore really only represent object hierarchies of the data model.
It may well be that your domain objects consist of only elements that are represented and stored within a persistent data store and this would feel somewhat like duplication if creating both domain and JPA entities that contain the exact same data structures.
True domain objects live at the center of the application's architecture where all dependencies point towards them and this would also include the data layer. I would always recommend this approach purely as it clarifies the actual intent for the architectural boundaries.
Edit.
To answer your second part of the question on SRP violation in JPA - it depends. The responsibilities (in SRP) do tend to match relational tables since we tend to logically group related data together (think Account table, or Contact table). This does fall down in JPA more often though when thinking about relationships (Employee -> Salary).
I am wondering if JPA entity and DDD entity should be the same class? I can see in examples that it is a common practice, but doesn't it violate single responsibility principle?
You may want to review Classes vs Data Structures, by Robert Martin.
I would normally expect JPA entities to be "anemic bags of data"; they are essentially messages written in the past to be consumed in the future.
Domain Model Entities, on the other hand, are not anemic - they have direct understanding of how to mutate their own data structures in accordance with the rules of the domain in which they serve.
In the DDD book, Evans describes using the "factory" pattern to create an instance of a domain entity from raw data. That pattern fits equally well with creating a domain entity from a jpa entity.
The transformation in the other direction -- taking a domain entity and extracting from it the data you need to save, is not clearly addressed, but the mechanics are the same. You read data out of your domain entity, and write it into your jpa entity (whether or you writing into a new jpa entity, or updating one that already exists, will depend on the details of your persistence strategy).
You aren't guaranteed to make a mess if you try to make the two entities "the same", but there are definitely different concerns.
For example, our persistent representation of data is expected to have a life cycle that spans many releases, new versions of our domain model are supposed to be to work our previously stored data. At the same time, we should be able to change the data structures that we use inside the domain model freely. Semantically, it's the same information, but the pressures on structure and organization are very different.

Auditing using Data tables vs Separate Audit tables

I am in the process of designing a new java application which has very strict requirements for auditing. A brief context is here:
I have a complex entity with multiple one to many nested relationships. If any of the field changes, I need to consider it as a new version of the object and all this need to be audited as well. Now I have two options:
1.) Do not do any update operation, just insert a new entity whenever anything changes. This would require me to create all the relational objects (even if they have not been changed) as I do not want to hold references to any previous version objects. My data tables becomes my auditing table as well.
OR
2.) Always do an update operation and maintain the auditing information in separate tables. That would add some more complexity in terms of implementation.
I would like to know if there is a good vs bad practice for any of these two approaches.
Thanks,
-csn
What should define your choice is your insert/update/read patterns for both the "live" data and the audits.
Most commonly these pattern are very different for both kinds.
- Conserning "live" it depends a lot on your application but I can imagine you have significants inserts; significatant updates; lot of reads. Live data also require transactionality and have lot relationship between tables for which you need to keep consistency. They might require fast and complex search. Indexes on many columns
- Audits have lot of inserts; almost no update; few reads. Read, search don't requires complex search (e.g. you only consult audits and sort them by date) and indexes on many columns.
So with increased load and data size you will probably need to split the data and optimize tables for your use cases.

Custom hibernate entity persister

I am in the process of performance testing/optimizing a project that maps
a document <--> Java object tree <--> mysql database
The document, Java classes, database schema and logic for mapping is orchestrated with HyperJaxb3. The ORM piece of it is JPA provided by hibernate.
There are about 50 different entities and obviously lots of relationships between them. A major feature of the application is to load the documents and then reorganize the data into new documents; all the pieces of each incoming document eventually gets sent out in one outgoing document. While I would prefer to not be living in the relational world, the transactional semantics are a very good fit for this application - there is a lot of money and government regulation involved, so we need to make sure everything gets delivered exactly once.
Functionally, everything is going well and performance is decent (after a fair amount of tweaking). Each document is made up of a few thousand entities which end up creating a few thousand rows in the database. The documents vary in size, and insert performance is pretty much proportional to the number of rows that need to be inserted (no surprise there).
I see the potential for a significant optimization, and this is where my question lies.
Each document is mapped to a tree of entities. The "leaf" half of the tree contains lots of detailed information that is not used in the decisions for how to generate the outgoing documents. In other words, I don't need to be able to query/filter by the contents of many of the tables.
I would like to map the appropriate entity sub-trees to blobs, and thus save the overhead of inserting/updating/indexing the majority of the rows I am currently handling the usual way.
It seems that my best bet is to implement a custom EntityPersister and associate it with the appropriate entities. Is this the right way to go? The hibernate docs are not bad, but it is a fairly complex class that needs to be implemented and I am left with lots of questions after looking at the javadoc. Can you point me to a concrete, yet simple example that I can use as a starting point?
Any thoughts about another way to approach this optimization?
I've run in to the same problem with storing large amounts of binary data. The solution I found worked best is a denormalization of the Object model. For example, I create a master record, and then I create a second object that holds the binary data. On the master, use the #OneToOne mapping to the secondary object, but mark the association as lazy. Now the data will only be loaded if you need it.
The one thing that might slow you down is the outer join that hibernate performs with all objects of this type. To avoid it, you can mark the object as mandatory. But if the database doesn't give you a huge performance hit, I suggest you leave it alone. I found that Hibernate has a tendency to load the binary data immediately if I tried to get a regular join.
Finally, if you need to retrieve a lot of the binary data in a single SQL call, use the HQL fetch join command. For example: from Article a fetch join a.data where a.data is the one-to-one relationship to the binary holder. The HQL compiler will see this as an instruction to get all the data in a single sql call.
HTH

How to model relationship between different parent entities and one common child entity

I'm looking for some suggestions for best practices around modeling the relationship between various entities and their documents (binaries such as PDF, TIFF etc). The entities are standard JPA/Hibernate stored in a PostgreSQL database. The documents themselves will be stored in MongoDb database.
The plan is to create a child entity to represent the document, which contains the id to the binary data to retrieve it as needed. But what would the relationship be?
If I simply created one if these document entities for each parent entity then a simple one to many relationship would work, but that seems to redundant.
I could simply put a "type" column that indicates which entity the document belongs to, and then query the document table with a named query of "id = ? and type = ?". I guess that would work, but there is something about that I'm not crazy about either - just can't put my finger on it :) Maybe that's just fine.
Another option I have looked at (although I admit I have never used it before, and would need to study it a bit more) is to use a unidirectional one to many with join table. However, I don't think this will work either since there is no guarantee that there wouldn't be duplicate parent keys. I use a single sequence for all basic relation tables primary keys, which should guarantee it, but it still doesn't sound like a good idea.
Finally, I have considered whether I create an entity and then extend it for each parent entity, but I think that would have the same flaw - the theoretical existence of non-unique parent ids.
Before I make a final decision, I'd like to see what other suggestions the community might have to offer.
Thanks in advance for your ideas.
If I simply created one if these document entities for each parent entity then a simple one to many relationship would work, but that seems to redundant.
I'm a bit confused. If you create a document for each parent, isn't that one-to-one and not one-to-many? It sounds like you want a one-to-many relationship. In which case, you would only create a single document for all parent entities that reference it.

DAO design pattern and using it across multiple tables

I'm looking for feedback on the Data Access Object design pattern and using it when you have to access data across multiple tables. It seems like that pattern, which has a DAO for each table along with a Data Transfer Object (DTO) that represents a single row, isn't too useful for when dealing with data from multiple tables. I was thinking about creating a composite DAO and corresponding DTO that would return the result of, let's say performing a join on two tables. This way I can use SQL to grab all the data instead of first grabbing data from one using one DAO and than the second table using the second DAO, and than composing them together in Java.
Is there a better solution? And no, I'm not able to move to Hibernate or another ORM tool at the moment. Just straight JDBC for this project.
I would agree with your approach. My DAOs tend to be aligned more at the object level, rather than from a DB Table perspective. I may manage more than one object through a DAO, but they will very likely be closely related. There is no reason not to have SQL accessing two tables living in one DAO.
And for the record, I have banished the acronym DTO from my vocabulary and code.
Ideally, how you store your data in a database, and then how you access them, should be derived from the nature of the relationship among the domain entities in your domain model. That is, Relational Model should follow from Domain Model. For example, if you have two entities, say, User and Address.
Scenario #1: Address are never accessed independently, they are always an attribute of User.
In this case, Address is a Value Object and User is an Entity, and there are guides on how to store this relationship. One way is to store Address attributes of Address alongside of attributes of User, in a single table. In this case, UserDao will handle both objects.
Scenario #2: Address can be associated to a User, but also can be separate on its own, an entity.
In this case, an approach different from the first one is needed. You may have a separate DAO and table for the Address type.
My point is, that more often this important idea is ignored that Domain Model should be the core of the application, driving other layers.
For instance, if your domain model is properly define and you are well aware of the type of entities you have and the relationship among them, then your persistence (relational tables and their relationships, your DAOs, etc) will evolve as a very logical consequence of what you have in the domain model.
In other words, if you spend some time studying your model, you will be able to trace your problem in determining how to organize your DAOs to a place in the domain model. If you can clearly define the type of the objects and the nature of relationship among them in the domain model, it will, help you resolve your problem in DAL layer.

Categories