Simple Database Design - Employee Manager Relationship - java

I'm new to database design, and I was curious if I am approaching a problem the wrong way.
So, I'm creating a simple application with the requirement of persisting an Employee entity. Some Employees may be Managers with a list of Employees that server under them. What is the best practice to create this scenario in database design?
Currently I have two tables. One table called employee, which contains columns for ID, name, etc. I have a second table defining a manager-employee relationship. This table contains a column MANAGER_ID, and a column EMPLOYEE_ID. To figure out what employees a manager has, I have to join with this relationship table, and grab each unique employee_id for that specific manager_id. Is this a good way to do this?
If not, can you explain why it is bad, and an example of better design?

Presuming that you're talking about a relational database, this seems an entirely plausible. The employer / employee relationship database example is very common and an internet search shouldn't take too long to validate your design.
If you're new to database design, you could do a it worse than to get yourself a decent book on the subject such as "Handbook of Relational Database Design" by Fleming / von Halle, which is pretty old but gives a good grounding in the basics which a surprising large amount of programmers are unaware of. That's just one decent book on the subject - there are plenty of others.

The alternative is to use a self-join, viz. each row in the employee table has an additional column manager_id. You then do a self-join when querying
See an example (for your exact case) here:
https://blog.udemy.com/sql-self-join/
Not sure if there are any benefits of one approach versus the other.

Related

JPA Single foreign Key ID to refer to two (or more) tables

Simple question really. This is using JPA on Java and what I what to do is to have a table with and column which can refer to one of two tables. To make this clearer I can have a 'User' table and a 'TempPerson' table. I don't want to pollute my User table (as I use it for security as well, plus has other info as well). Now lets say I have a third table called 'Game'. Now when someone stars a game against someone, they can play against someone in the system already ie. User or someone where they can type a name and new entry for TempPerson is created and used. So the game for player2 (or player1) will be a mapped id to either User.id or TempPerson.id. Now I understand that a determining column may need to be placed into Game to determine what the Id is for but I hope JPA will cater for it somehow. Any ideas will be helpful, i could use inheritance but not sure about it.
Here is another example:
Lets say I have a table which holds information about images => id, resolution, width, height, location, bucket .... id_in_the_table_where_used, table_name_of_where_used. Now, this one table can hold the images for profiles, places, etc... and the profiles, places will have an id referring to the images table, but I also would like the images table to have an id back to where the images is used, which table and which id is using it.
It almost I am asked i 'one to many tables' solution. Although I could have many in between tables etc... Seems overkill to so something quite simple, although many DBAs may be cursing this idea. It does minimise queries, number of tables etc...
Thanks in advance
It is possible to use single FK to target multiple tables. You would have to use #JoinColumn for that
#Entity
public class User{
#OneToOne
#JoinColumn("universalId", targetEntity=Avatar.class)
private Avatar
#oneToMany
#JoinColumn("universalId", targetEntity=Log.class)
private List<Log> logs;
}
This would use universalId column of User's table to lookup related records from Avatar and Log tables
This however is rather anti-pattern, causing a lot of consequences when for example universalId will have to be changed etc. 1 column = 1 FK - go that way.

Java - Google App Engine - modelling graph structures in Google Datastore

Google Apps Engine offers the Google Datastore as the only NoSQL database (I think it is based on BigTable).
In my application I have a social-like data structure and I want to model it as I would do in a graph database. My application must save heterogeneous objects (users,files,...) and relationships among them (such as user1 OWNS file2, user2 FOLLOWS user3, and so on).
I'm looking for a good way to model this typical situation, and I thought to two families of solutions:
List-based solutions: Any object contains a list of other related objects and the object presence in the list is itself the relationship (as Google said in the JDO part https://developers.google.com/appengine/docs/java/datastore/jdo/relationships).
Graph-based solution: Both nodes and relationships are objects. The objects exist independently from the relationships while each relationship contain a reference to the two (or more) connected objects.
What are strong and weak points of these two approaches?
About approach 1: This is the simpler approach one can think of, and it is also presented in the official documentation but:
Each directed relationship make the object record grow: are there any limitations on the number of the possible relationships given for instance by the object dimension limit?
Is that a JDO feature or also the datastore structure allows that approach to be naturally implemented?
The relationship search time will increase with the list, is this solution suitable for large (million) of relationships?
About approach 2: Each relationship can have a higher level of characterization (it is an object and it can have properties). And I think memory size is not a Google problem, but:
Each relationship requires its own record, so the search time for each related couple will increase as the total number of relationships increase. Is this suitable for large amount of relationships(millions, billions)? I.e. does Google have good tricks to search among records if they are well structured? Or I will be soon in a situation in which if I want to search a friend of User1 called User4 I have to wait seconds?
On the other side each object doesn't increase in dimension as new relationships are added.
Could you help me to find other important points on the two approaches in such a way to chose the best model?
First, the search time in the Datastore does not depend on the number of entities that you store, only on the number of entities that you retrieve. Therefore, if you need to find one relationship object out of a billion, it will take the same time as if you had just one object.
Second, the list approach has a serious limitation called "exploding indexes". You will have to index the property that contains a list to make it searchable. If you ever use a query that references more than just this property, you will run into this issue - google it to understand the implications.
Third, the list approach is much more expensive. Every time you add a new relationship, you will rewrite the entire entity at considerable writing cost. The reading costs will be higher too if you cannot use keys-only queries. With the object approach you can use keys-only queries to find relationships, and such queries are now free.
UPDATE:
If your relationships are directed, you may consider making Relationship entities children of User entities, and using an Object id as an id for a Relationship entity as well. Then your Relationship entity will have no properties at all, which is probably the most cost-efficient solution. You will be able to retrieve all objects owned by a user using keys-only ancestor queries.
I have an AppEngine application and I use both approaches. Which is better depends on two things: the practical limits of how many relationships there can be and how often the relationships change.
NOTE 1: My answer is based on experience with Objectify and heavy use of caching. Mileage may vary with other approaches.
NOTE 2: I've used the term 'id' instead of the proper DataStore term 'name' here. Name would have been confusing and id matches objectify terms better.
Consider users linked to the schools they've attended and vice versa. In this case, you would do both. Link the users to schools with a variation of the 'List' method. Store the list of school ids the user attended as a UserSchoolLinks entity with a different type/kind but with the same id as the user. For example, if the user's id = '6h30n' store a UserSchoolLinks object with id '6h30n'. Load this single entity by key lookup any time you need to get the list of schools for a user.
However, do not do the reverse for the users that attended a school. For that relationship, insert a link entity. Use a combination of the school's id and the user's id for the id of the link entity. Store both id's in the entity as separate properties. For example, the SchoolUserLink for user '6h30n' attending school 'g3g0a3' gets id 'g3g0a3~6h30n' and contains the fields: school=g3g0a3 and user=6h30n. Use a query on the school property to get all the SchoolUserLinks for a school.
Here's why:
Users will see their schools frequently but change them rarely. Using this approach, the user's schools will be cached and won't have to be fetched every time they hit their profile.
Since you will be getting the user's schools via a key lookup, you won't be using a query. Therefore, you won't have to deal with eventual consistency for the user's schools.
Schools may have many users that attended them. By storing this relationship as link entities, we avoid creating a huge single object.
The users that attended a school will change a lot. This way we don't have to write a single, large entity frequently.
By using the id of the User entity as the id for the UserSchoolLinks entity we can fetch the links knowing just the id of the user.
By combining the school id and the user id as the id for the SchoolUser link. We can do a key lookup to see if a user and school are linked. Once again, no need to worry about eventual consistency for that.
By including the user id as a property of the SchoolUserLink we don't need to parse the SchoolUserLink object to get the id of the user. We can also use this field to check consistency between both directions and have a fallback in case somehow people are attending hundreds of schools.
Downsides:
1. This approach violates the DRY principle. Seems like the least of evils here.
2. We still have to use a query to get the users who attended a school. That means dealing with eventual consistency.
Don't forget Update the UserSchoolLinks entity and add/remove the SchoolUserLink entity in a transaction.
You question is too complex but I try explain the best solution (I will answer in Python but same can be done in Java).
class User(db.User):
followers = db.StringListProperty()
Simple add follower.
user = User.get(key)
user.followers.append(str(followerKey))
This allow fast query who is followed and followers
User.all().filter('followers', followerKey) # -> followed
This query i/o costly so you can make it faster but more complicated and costly in i/o writes:
class User(db.User):
followers = db.StringListProperty()
follows = db.StringListProperty()
Whatever this is complicated during changes since delete of Users need update follows so you need 2 writes.
You can also store relationships but it is the worse scenario since it is more complex than second example with followers and follows ... - keep in mind than entity can have 1Mb it is not limit but can be.

Table relationships Many to many without a middle table in sql?

I am making an app that can lookup creatures and am in the process atempting to increase my knowledge.
I have a table Creatures and a table Skills
A creature and have multiple skills and a skill can be used by multiple creatures.
I am coding in java using sql manager.
I am using 1,2 to represent skills in the creature table and reference the skills table using the numerical values.
One thought I had was is there a way to make an overloaded stored procedure?
I have not started coding yet as I am still planning but would appreciate any ideas sent my way.
I am not trying to avoid the middle table just see if there is a way to do it another way that is not so hard its pointless.
You will probably need the middle table.
Storing a comma-separated list of skills in the Creatures table makes it easy to fetch the skills per creature, but what if you ever want to know the creatures who have a given skill?
Comma-separated lists are fraught with problems. You can use them to optimize one way of accessing the data, but that causes a drastic de-optimization of other ways of accessing the data.
See also my answer to Is storing a delimited list in a database column really that bad?
If you're using a relational database, the "right" and general way to solve it is with a table that will store the relation.
If you want to avoid the middle table, you can put a constraint on the maximum number of skills per creature - let's say max 5 skills, and then have fields called skill1, skill2, ..., skill5. I cannot recommend this option, because it will make querying much more complicated, but for some cases it's possible.
Another improvement of this option, would be a single int or long field, where each bit represents a skill. Still not good in my opinion though.

relationship and build database

For an excercise I need to build something like :
For a course I need to create a review that is made up out of certain reviewlines and feedbackscores.
This review object (unique instance) needs to be filled in by a list of customers.
Depending on the course the review is for, the review will change (e.g.for one course the number of reviewlines and feedbackscores will change). Each customer can be enrolled in more then one course and each review is specific for him.
Now how do I need to see the relationsship between "review" object (unique instance) and "customer" if I want to use JPA to save this all to the db?
A customer can have more then one review he/she needs to fill in.
A certain review object needs to be filled in by many customers (but this is a review object with a certain build [reviewlines and feedbackscores]) and unique for him.
Maybe I see it to complex but what is the best way to build this?
Try the following:
I think it's covered all your design points.
I am trying to read between the lines of your comments, and I think you want to implement a system where you capture a number of 'rules' for the Review (I'm guessing, but examples may be that reviews can be up to n lines, there must be at least m CustomerReviews before the Review gains a degree of quality). If this is indeed the case, I have created a ReviewTemplate class:
ReviewTemplate would have attributes/columns for each of value you would need. These attributes/columns are duplicated on Review
Populate ReviewTemplate with a number of rows, then create a row in Course and link it to one ReviewTemplate
When a Course needs a Review, copy the fields from the ReviewTemplate into the Review
In Java, implement the business rules for Review using the copied values - not the values on ReviewTemplate.
Why copy the values? Well, I bet that at some point, users want to edit the ReviewTemplate table. If so, what happens to the Review objects using the edited ReviewTemplates? Does the modified value on ReviewTemplate somehow invalidate past Reviews and break your business logic? No, because you copied the rule values to Review and so past Reviews will not change.
EDIT: Answers to specific questions
How do you see the duplicating? I can create an entity ReviewTemplate with the specified attributes. In this entity there will be a relationship with reviewlines and feedbackscores.
I see each ReviewTemplate as holding prototypical values for a particular 'type' of Review, which just might include a default reviewLine (but that might not make sense) and a default feedbackScore. When you create the Review, you would do the following:
Instantiate the Review and populate with values from ReviewTemplate
Instantiate as many CustomerReview objects as you need, linking them to the relevant Customer objects (I infer this step from your previous comments. It might also make sense to omit this step until a Customer voluntarily elects to review a Course)
(If appropriate) Populate the CustomerReview attribute feedbackScore with the default value from ReviewTemplate
Instantiate CustomerReviewLine records as appropriate
If you follow this approach, you do not need to add a relationship between ReviewTemplate and CustomerReviewLines.
When I e.g. state that customers 1 to 4 need to fill in the review 4 specific "objects" need to be created that will hold the information and also 4 sets of the needed reviewlines and feedbackscores need to be created so they all can hold the information.
Absolutely.
I just don't know how to implement this is a JPA structure so the information is hold in the db ... ?
JPA allows you to attack the problem in many ways, but the best practice is to manually create both the DB schema and the Java classes (eg see https://stackoverflow.com/a/2585763/1395668). Therefore, for each entity in the diagram, you need to:
Write SQL DDL statements to create the table, columns, primary key and foreign keys, and
Write a Java class denoted with the #entity annotation. Within the class, you will also need to annotate the id (primary key) with #id and the relationships with #OneToMany or #ManyToOne (theirs additional parameters in the annotation to set as well).
Now, on the JPA side, you can do things like:
ReviewTemplate template = course.getReviewTemplate(); //assuming the variable course
Review review = new Review();
review.setCourse(course);
review.setRuleOne(template.getRuleOne());
// Copy other properties here
EntityManager em = // get the entity manager here
em.persist(review);
// Assume a set or list of customers
for (Customer customer : customers) {
CustomerReview cr = new CustomerReview();
cr.setReview(review);
cr.setCustomer(customer);
cr.setFeedbackScore(template.getDefaultFeedbackScore());
// set other CustomerReview properties here
em.persist(cr);
// You can create CustomerReviewLine here as well
If written inside a standard EJB Session Bean, this will all be nicely transacted, and you will have all your new records committed into the DB.
EDIT 2: Additional question
(I'm assuming that the second comment completely supersedes the first)
So when I create a reviewtemplate and I link it to a bunch of customers I write the template to the db and create a bunch of reviews based on the template but linked to the specific customer and with his own unique reviewlines and feedbackscores. Like I see it now the reviewline (more a question or discription) is the same for each review (of a template), it is only the score that changes between the customers
I finally think I understand ReviewLine. I had thought it a place where the Customer enters lines of text the comprise the CustomerReview. I now believe that ReviewLine is a specific question that the Customer is asked, and which the Customer provides a feedbackScore.
With this understanding, here is an updated ER/Class diagram.
Note that there are some significant changes - there are several more tables:
ReviewLineTemplate provides a place for template questions to be stored on a ReviewTemplate
When a Review is instantiated/inserted (which is a copy of a specific ReviewTemplate), the ReviewLineTemplates are copied as ReviewLines. The copy operation allows two important features:
On creation, a Review and its ReviewLines can be customized without affecting the ReviewTemplate or ReviewLineTemplate
Over time, the ReviewTemplate and ReviewLineTemplate can be updated, edited and continually improved, without changing the questions that the Customer has already answered. If CustomerFeedbackScore were linked to ReviewLineTemplate directly, then editing the ReviewLineTemplate would change the question that the Customer has answered, silently invalidating the feedbackScore.
FeedbackScore has been moved to a join-table between ReviewLine and CustomerReview.
Note that this model is fully denormalised which makes it more 'correct' but harder to build a GUI for. A common 'optimization' might be to introduce:
10 (say) columns on ReviewTemplate and Review called reviewLine1 through reviewLine10.
10 (say) columns on CustomerReview called feedbackScore1 through feedbackScore10.
Remove the ReviewTemplateLine, ReviewLine and CustomerReviewLine tables
Doing so is not normalised, and may introduce a set of other problems. YMMV
The structure of data always depends on the requirements, and there never exists a "one-and-only" solution. So, do you need maximised atomiticy or a high performance data system?
The fastest and easiest solution would be not using a database, but hash tables. In your case, you could have something like 3 hash tables for customer, review, and probably another one for the n:n relationship. Or if you're using a database, you could just store an array of the review-primary-keys in one field in the customer table.
However, we all learn in school to do atomicity, so let's do that (I just write the primary/foreign keys!):
Customer (unique_ID, ...)
Review (unique_ID, ...)
Customer_Review (customer_ID, review_ID, ...) --> n:n-relationship
The Customer_Review describes the n:n-relationship between customers and reviews. But if there is only one customer per review possible, you'll do that like this:
Customer (unique_ID, ...)
Review (pk: unique_ID, fk: customer_ID, ...) --> 1:n-relationship
However, I suggest you need to learn ERM as a good starting point: http://en.wikipedia.org/wiki/Entity_relationship_model
You need a ManyToMany relation :
One customer -> several reviews.
One review -> several customers.
So you will have 3 tables in your database schema : Customer, review and a junction table with the customer ID and the review ID.
See Wikipedia : Many to Many

Multiple link between 2 tables columns...bad design approach?

Hello
I'm developing a webapp and i'm about to design the database, i came across this question.
Is it a bad design to have more then 1 link between 2 tables?
The picture i have posted is a very quick and small example just to make it clearer.
If i would like to display all the offers,i would like to insert also the products they are related to, in this case i could retrieve the product name by creating a product instance retrieved with the product id from the product id field in the offer object, but it would require more queries execution and more typing work, so i was thinking to include the product name directly in the offer so that i can simply retrieve all offers and eventually display the related product by browsing the DB with its product id.
Would you consider this a bad approach?
I have been looking around for cases like mine but i have only found approaches with 1 connection between tables (with unique id's)
Thank you
This is data denormalization. Don't do it (in most cases). Design the tables correctly, let the database do the correct work with the correct queries. It will be much easier to maintain and work with over time.
Use the ID in the offer table to lookup the product name in the products table.
yes this would be bad.
removing the redundant name would be proper normalization. just link on the id, that will be the best way.
In general there is no limit to the number of relationships (links) between two tables, but each relationship should have a unique meaning. If, in your example, Product Name and Product ID are both candidate keys and each name always has the same ID then you should definitely not have two PK/FK relationships between these tables.
#Joe is right. Normalization is the best approach to take with database design. The reason being so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

Categories