I am going to use ElasticSearch for as the search repository in my application.
I have a few questions regarding what is best practice when it comes to organizing
objects in the search index when the objects have associations/relations to each other.
From what I know search indexes is a flat structure and doesn't work with the concept of
relations in the same way as a database.
Let say you have these domain objects:
Person:
- Has a one-to-many relationship with Car
Car:
- Is owned by one Person, many-to-one with Person
Department:
- Each Department have many People and each Person may belong to many Department, many-to-many
What would be the best way to store this in the search index? What are the options? For instance I want to find all the people belonging to a certain deparment, or all people where the car has more than 300 bhp.
I am using the Java client API if it matters.
Elastic search (or Lucene) isn't a relational database, so you would need to flatten your relationship model.
Try to model a view that gets this structure -
Car|Person|Department
This will give you all attributes required to lookup a car. This can be imported into a document for Car.
Similarly
Person|Department
will give you all information for a person. This will help you lookup a Person
Department can be a third document.
You can have multiple documents for each entity. But the relationship needs to be translated as a property of the entity.
Related
I'm confused with designing a client software with database integration to what should be a member variable of the class or just a query to the database. Let me be specific with a trivial example:
If I have, lets say, a Student class, which has a list of "friends" that are Student objects. Should my software design have an ArrayList<Student> as a member variable of the Student class or should the Database deal with the relationship itself and the Student class doesn't account for those "friends"? How should a proper UML class-diagram be in this case?
This question is broader than you may think, as there are many ways to deal with it. Here some first ideas:
Let's start with a quick class diagram. The friendship between students is a many-to-many association.
In a database, a many-to-many association is usually implemented using an association table. So you'd have two tables: STUDENTS and FRIENDSHIPS with pairs of ids of befriended students:
To load a Student object from the database, you'd read the data in a STUDENTS row and use it to initialize your object. For the friendship, you'd have to read the relevant FRIENDSHIPS rows.
But how to use these tables in the application?
A first possibility would be to load each Student friend and insert it in the ArrayList<Student>. But each loaded student is like the first student and could have oneself friends that you'd have to load as well! You'd end up loading a lots of students, if not all, just for getting the single one you're interested in.
A second possibility would be use an ArrayList<StudentId> instead of an ArrayList<Student> and populate it. You'd then load the friends just in time, only when needed. But this would require some more important changes in your application.
A third possibility is not to expose an ArrayList. Not leaking the internals is always a good idea. Instead use a getter. So you'd load the friends only if student.getFriends() is called. This is a convenient approach, as you'd have a collection of friends at your disposal, but avoid being caught in a recursive loading of friends of friends.
In all the cases, you may be interested in using a repository object to get individual or collections of students, and encapsulate the database handling.
Advice: as said, there are many more options, the repository is one approach but there are also active records, table gateways and other approaches. To get a full overview, you may be interested in Martin Fowler's book Patterns of Enterprise Application Architecture.
You need a one-to-many relationship between Student and friends in both the relational database and the object model.
I am trying to build a Book app.
The entities are so-
The Entity User has a field called Role which specifies if this is an Author or Reviewer. Currently, I am using an Enum data member to differentiate if the User is an Author or Reviewer. One single instance of a User cannot be both.
A Book will have 1...* Authors- So a User entity which is an Author will have a many-to-many relation with entity Book
A Book will have 1...* Reviewers- So a User entity which is a Reviewer will have a many-to-many relation with entity Book
I was wondering how would I implement the User side - I preferably want one single collection of Books- If the User is an Author, this will contain a reference to the books he has authored. If it is a reviewer, it will contain references to the books on which he is a reviewer.
I am wondering if there is any construct in JPA/ Hibernate which can enable this.
I could always implement Author and Reviewer as different entities, but I still want to know the answer to such a situation which I presume must be quite commonly encountered.
For MarkLogic (and maybe for noSQL in general?), is it best to store parent-child as one document? Thus, if coming from a relational world, a normalized parent-child table will need to be denormalized and stored as a single document?
Will this design impact how searches are done (since children records now are searched always in the context of the parent)?
It might depend whether children can have multiple parents or not (e.g. graph-type data, instead of hierarchical), but my reasoning would be that for hierarchical data, storing it in its natural hierarchical form (using XML or JSON or such), makes most sense. It doesn't mean storing the entire parent-child table as one document, but rather expanding the records to its original trees, and storing those as documents.
This will not fit all NoSQL solutions, but will work well for those that fall into the document store category, particularly if they provide good search around contents and hierarchy.. like MarkLogic..
Note: graph-type data can be stored as triples inside MarkLogic. That will allow querying it with SPARQL, and inferencing over it for instance..
HTH!
It's not that the parent-child relationship is "denormalized", but rather the children are "merged" into the parent.
One thing to consider is the type of relationship you have. UML provides descriptions for different kinds of relationships - see Difference between association, aggregation and composition .
In general (exceptions exist), I think association and aggregation relationships will be between separate documents, whereas composition relationships will be "merged" into a single document.
Concrete example - a person knows many persons (association), a person can own many vehicles (aggregation, a vehicle only has one owner, but its own lifecycle), and a person can have many names (composition). I would create person and vehicle documents, but not name documents - I would store all the names on the person document.
To me, that's a big advantage of a document database over a relational database. In the latter, I'm forced to create separate tables no matter what kind of relationship I have. In a document database, I can choose what makes the most sense and fits my application's needs. Very often, my physical document model much more closely resembles my application's conceptual model.
I am using Spring Data Neo4J with Spring Boot v1.3.1. My Neo4J version is 2.1.6 .
Let's say, I have an Entity Person, which can have a relation named Friend with a Set of Person. So, I define a Set as one of the attributes of the Entity, use the #RelatedTo annotation and give it a type named Friend.
What if I want to have multiple other relationships, all with the same entity only, let's say, Enemy, Acquaintance etc. Do I have to define, separate attributes for all of them ? Can't I pass the relationship dynamically ?
For reference:
#NodeEntity
public class Person {
#RelatedTo(type="FRIEND", direction=Direction.BOTH)
public #Fetch Set<Person> friends;
//Do I have to do it like this ? This is odd.
#RelatedTo(type="ENEMY", direction=Direction.BOTH)
public #Fetch Set<Person> enemies;
//getter setters
}
EDIT 1-----------
Right now, I'm facing an issue with creating nodes in a bulk. Explaining the problem below :
After considering the approach suggested by Michael, here is what I have.
Basically, I have to create a lot of nodes in bulk. This node, Person will have an attribute with a unique index over it. Let's call it name. So, when the relations, Friend or Enemy are created, I want them to be created with person with unique name.
So, there will be two steps:
Create the Person nodes.(takes lot of time)
Create the relations between them.(does not take much time, around 30-40 ms)
I tried different approaches of creating nodes in bulk.
One approach was to commit the transactions after a certain number of nodes have been saved.
I had followed this link
I'm not sure about the performance improvement as calling the neo4jTemplate.save() still takes around 500ms.
From my logs:
Time taken to execute save:=612 ms
Time taken to execute save:=566 ms
Is this supposed to alright ?
Another approach was using Cypher, as suggested by Michael in his blog, here.
I used a Cypher query like this :
WITH [{name: "Person1", gravity: 1},
{name: "Person2", gravity: 2}] AS group
FOREACH (person IN group |
CREATE (e:Person {label: person.name, gravity: person.gravity}))
Issue with this approach is nodes do get created in bulk, but the unique index on name attribute is ignored. It seems, I must commit after saving each node.
So, is there any other way, in which I will be able to create nodes in bulk in a faster manner ?
You can handle it with creating relationship entities of the different types.
Or using Neo4jTemplate directly (createRelationshipBetween).
If you have such a dynamic setup, what would your entities look like?
You don't have to list the relationships in your entity. If they are dynamic you can also just have base attributes in your entities and access the relationships via a repository.
I need help in modeling a set of tables/classes in my project. I also need help on how to do the hibernate mapping for these tables. I have following tables in my project.
Person
Organization
Contact
Address
Person table can have one or more addresses. Organization and Contact can have only one Address. So I have added the following columns to establish the relationship between the tables.
Address table has PersonId [Since one Person can have more than one address].
Organization and Contact table has AddressId [Since these tables can have only one address].
I want to know what java classes I need to create for these tables. Currently I have Person, Address, Organization and Contact classes. Not sure how to link Address class to Person, Organization and Contact class.
I want to know whether I should consider Address as a Component or an Entity. And how create the xml mapping Address is considered as a Component.
The PersonId in Address table may have null values for the Address records created for Organization and Contact. I fine with creating a separate table [say Person_Address] to store list of addresses for a person. But having this link table might give an provision for many to many relationship between Person and Address tables. How do I enforce one to many relationship in this case.
I would go for what you suggest in 3.
By using a OneToMany relationship on the set of addresses in the Person class the intermediate table will be created automatically and I guess it will have a unique constraint on the address id.
Edit: You will only get a reference in the address table if you add a corresponding many-to-one annotation in the address class and use the and map the one-to-many annotation to that field. But since your not doing that you'll get a Person_Address table even with a one-to-many relationship without creating any extra classes.
You should model your classes like you would in plain java. I.e Person would have a set of addresses, the others will have just one and the Address class is oblivious of the others.
Then you add a #OneToMany annotation in the Person class and #OneToOne class in the others. Or you put that in your orm.xml, although annotations is much better for maintenance.
As for Component/Embedable vs Entity I would suggest entity as it is the simplest and no limitations. Don't use to many concepts at once, and stick to the main road.