Multiple link between 2 tables columns...bad design approach? - java

Hello
I'm developing a webapp and i'm about to design the database, i came across this question.
Is it a bad design to have more then 1 link between 2 tables?
The picture i have posted is a very quick and small example just to make it clearer.
If i would like to display all the offers,i would like to insert also the products they are related to, in this case i could retrieve the product name by creating a product instance retrieved with the product id from the product id field in the offer object, but it would require more queries execution and more typing work, so i was thinking to include the product name directly in the offer so that i can simply retrieve all offers and eventually display the related product by browsing the DB with its product id.
Would you consider this a bad approach?
I have been looking around for cases like mine but i have only found approaches with 1 connection between tables (with unique id's)
Thank you

This is data denormalization. Don't do it (in most cases). Design the tables correctly, let the database do the correct work with the correct queries. It will be much easier to maintain and work with over time.
Use the ID in the offer table to lookup the product name in the products table.

yes this would be bad.
removing the redundant name would be proper normalization. just link on the id, that will be the best way.

In general there is no limit to the number of relationships (links) between two tables, but each relationship should have a unique meaning. If, in your example, Product Name and Product ID are both candidate keys and each name always has the same ID then you should definitely not have two PK/FK relationships between these tables.

#Joe is right. Normalization is the best approach to take with database design. The reason being so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

Related

Insert unique entities to the sqlite DB

There will be up to 100k entities in sqlite DB with following structure:
ID (Numeric, PK)
KEY (Varchar, Unique/PK)
Other fields, mostly varchars
I have a list of about 100-1k entities. I want to add to the DB only those entities, which KEY is not present in the DB and show the list of added ones.
Example
As an relevant example you may consider something like this: library with books as entities. Each Book has global unique ISBN number (KEY) and unique id (ID) in the library catalog.
Some person brings to the library set of books. Library checks the books by ISBN in the catalog, takes 'new' books and shows to the person list of taken books.
Some thoughts how it can be achieved:
1) select all KEYs from the DB, put them into [hash]set, in loop verify that KEY from new entities does not exist in the set.
2) like #1 but instead of selecting all KEYs, select only KEYs that present both in DB and the list
3) in loop check existence of entity with additional select query
4) enable constraints in the DB, check existence by catching exceptions
All of them have their own disadvantages, I believe. Can you suggest something better?
For now I'm asking mostly about 'best practices', I believe any of the approach will work for my case without huge performance issues (no actual tests for now, I'm just in analysis phase), but how should it be done better?
Code will be in Java, I plan to use simple DAOs with JDBC, but if someone suggests Hibernate as an alternative approach, I will reconsider my thoughts.
Your suggested solutions are all valid and possible situations, altough Number #1 and Number #4 are rarely good ideas. Number #2 and #3 are solutions that are often used. To choose from the two you need to answer the following questions:
What will be the typical use case for your application? (sometimes you just get 1 new book, you will always get hundreds of books at once, mixed)
Based on what metrics will this code be "judged", what are you trying to optimize for? (readability/maintainability, performance, etc)
Based on the answers to these questions above you can either pick #2 or #3.

Simple Database Design - Employee Manager Relationship

I'm new to database design, and I was curious if I am approaching a problem the wrong way.
So, I'm creating a simple application with the requirement of persisting an Employee entity. Some Employees may be Managers with a list of Employees that server under them. What is the best practice to create this scenario in database design?
Currently I have two tables. One table called employee, which contains columns for ID, name, etc. I have a second table defining a manager-employee relationship. This table contains a column MANAGER_ID, and a column EMPLOYEE_ID. To figure out what employees a manager has, I have to join with this relationship table, and grab each unique employee_id for that specific manager_id. Is this a good way to do this?
If not, can you explain why it is bad, and an example of better design?
Presuming that you're talking about a relational database, this seems an entirely plausible. The employer / employee relationship database example is very common and an internet search shouldn't take too long to validate your design.
If you're new to database design, you could do a it worse than to get yourself a decent book on the subject such as "Handbook of Relational Database Design" by Fleming / von Halle, which is pretty old but gives a good grounding in the basics which a surprising large amount of programmers are unaware of. That's just one decent book on the subject - there are plenty of others.
The alternative is to use a self-join, viz. each row in the employee table has an additional column manager_id. You then do a self-join when querying
See an example (for your exact case) here:
https://blog.udemy.com/sql-self-join/
Not sure if there are any benefits of one approach versus the other.

Java - Google App Engine - modelling graph structures in Google Datastore

Google Apps Engine offers the Google Datastore as the only NoSQL database (I think it is based on BigTable).
In my application I have a social-like data structure and I want to model it as I would do in a graph database. My application must save heterogeneous objects (users,files,...) and relationships among them (such as user1 OWNS file2, user2 FOLLOWS user3, and so on).
I'm looking for a good way to model this typical situation, and I thought to two families of solutions:
List-based solutions: Any object contains a list of other related objects and the object presence in the list is itself the relationship (as Google said in the JDO part https://developers.google.com/appengine/docs/java/datastore/jdo/relationships).
Graph-based solution: Both nodes and relationships are objects. The objects exist independently from the relationships while each relationship contain a reference to the two (or more) connected objects.
What are strong and weak points of these two approaches?
About approach 1: This is the simpler approach one can think of, and it is also presented in the official documentation but:
Each directed relationship make the object record grow: are there any limitations on the number of the possible relationships given for instance by the object dimension limit?
Is that a JDO feature or also the datastore structure allows that approach to be naturally implemented?
The relationship search time will increase with the list, is this solution suitable for large (million) of relationships?
About approach 2: Each relationship can have a higher level of characterization (it is an object and it can have properties). And I think memory size is not a Google problem, but:
Each relationship requires its own record, so the search time for each related couple will increase as the total number of relationships increase. Is this suitable for large amount of relationships(millions, billions)? I.e. does Google have good tricks to search among records if they are well structured? Or I will be soon in a situation in which if I want to search a friend of User1 called User4 I have to wait seconds?
On the other side each object doesn't increase in dimension as new relationships are added.
Could you help me to find other important points on the two approaches in such a way to chose the best model?
First, the search time in the Datastore does not depend on the number of entities that you store, only on the number of entities that you retrieve. Therefore, if you need to find one relationship object out of a billion, it will take the same time as if you had just one object.
Second, the list approach has a serious limitation called "exploding indexes". You will have to index the property that contains a list to make it searchable. If you ever use a query that references more than just this property, you will run into this issue - google it to understand the implications.
Third, the list approach is much more expensive. Every time you add a new relationship, you will rewrite the entire entity at considerable writing cost. The reading costs will be higher too if you cannot use keys-only queries. With the object approach you can use keys-only queries to find relationships, and such queries are now free.
UPDATE:
If your relationships are directed, you may consider making Relationship entities children of User entities, and using an Object id as an id for a Relationship entity as well. Then your Relationship entity will have no properties at all, which is probably the most cost-efficient solution. You will be able to retrieve all objects owned by a user using keys-only ancestor queries.
I have an AppEngine application and I use both approaches. Which is better depends on two things: the practical limits of how many relationships there can be and how often the relationships change.
NOTE 1: My answer is based on experience with Objectify and heavy use of caching. Mileage may vary with other approaches.
NOTE 2: I've used the term 'id' instead of the proper DataStore term 'name' here. Name would have been confusing and id matches objectify terms better.
Consider users linked to the schools they've attended and vice versa. In this case, you would do both. Link the users to schools with a variation of the 'List' method. Store the list of school ids the user attended as a UserSchoolLinks entity with a different type/kind but with the same id as the user. For example, if the user's id = '6h30n' store a UserSchoolLinks object with id '6h30n'. Load this single entity by key lookup any time you need to get the list of schools for a user.
However, do not do the reverse for the users that attended a school. For that relationship, insert a link entity. Use a combination of the school's id and the user's id for the id of the link entity. Store both id's in the entity as separate properties. For example, the SchoolUserLink for user '6h30n' attending school 'g3g0a3' gets id 'g3g0a3~6h30n' and contains the fields: school=g3g0a3 and user=6h30n. Use a query on the school property to get all the SchoolUserLinks for a school.
Here's why:
Users will see their schools frequently but change them rarely. Using this approach, the user's schools will be cached and won't have to be fetched every time they hit their profile.
Since you will be getting the user's schools via a key lookup, you won't be using a query. Therefore, you won't have to deal with eventual consistency for the user's schools.
Schools may have many users that attended them. By storing this relationship as link entities, we avoid creating a huge single object.
The users that attended a school will change a lot. This way we don't have to write a single, large entity frequently.
By using the id of the User entity as the id for the UserSchoolLinks entity we can fetch the links knowing just the id of the user.
By combining the school id and the user id as the id for the SchoolUser link. We can do a key lookup to see if a user and school are linked. Once again, no need to worry about eventual consistency for that.
By including the user id as a property of the SchoolUserLink we don't need to parse the SchoolUserLink object to get the id of the user. We can also use this field to check consistency between both directions and have a fallback in case somehow people are attending hundreds of schools.
Downsides:
1. This approach violates the DRY principle. Seems like the least of evils here.
2. We still have to use a query to get the users who attended a school. That means dealing with eventual consistency.
Don't forget Update the UserSchoolLinks entity and add/remove the SchoolUserLink entity in a transaction.
You question is too complex but I try explain the best solution (I will answer in Python but same can be done in Java).
class User(db.User):
followers = db.StringListProperty()
Simple add follower.
user = User.get(key)
user.followers.append(str(followerKey))
This allow fast query who is followed and followers
User.all().filter('followers', followerKey) # -> followed
This query i/o costly so you can make it faster but more complicated and costly in i/o writes:
class User(db.User):
followers = db.StringListProperty()
follows = db.StringListProperty()
Whatever this is complicated during changes since delete of Users need update follows so you need 2 writes.
You can also store relationships but it is the worse scenario since it is more complex than second example with followers and follows ... - keep in mind than entity can have 1Mb it is not limit but can be.

relationship and build database

For an excercise I need to build something like :
For a course I need to create a review that is made up out of certain reviewlines and feedbackscores.
This review object (unique instance) needs to be filled in by a list of customers.
Depending on the course the review is for, the review will change (e.g.for one course the number of reviewlines and feedbackscores will change). Each customer can be enrolled in more then one course and each review is specific for him.
Now how do I need to see the relationsship between "review" object (unique instance) and "customer" if I want to use JPA to save this all to the db?
A customer can have more then one review he/she needs to fill in.
A certain review object needs to be filled in by many customers (but this is a review object with a certain build [reviewlines and feedbackscores]) and unique for him.
Maybe I see it to complex but what is the best way to build this?
Try the following:
I think it's covered all your design points.
I am trying to read between the lines of your comments, and I think you want to implement a system where you capture a number of 'rules' for the Review (I'm guessing, but examples may be that reviews can be up to n lines, there must be at least m CustomerReviews before the Review gains a degree of quality). If this is indeed the case, I have created a ReviewTemplate class:
ReviewTemplate would have attributes/columns for each of value you would need. These attributes/columns are duplicated on Review
Populate ReviewTemplate with a number of rows, then create a row in Course and link it to one ReviewTemplate
When a Course needs a Review, copy the fields from the ReviewTemplate into the Review
In Java, implement the business rules for Review using the copied values - not the values on ReviewTemplate.
Why copy the values? Well, I bet that at some point, users want to edit the ReviewTemplate table. If so, what happens to the Review objects using the edited ReviewTemplates? Does the modified value on ReviewTemplate somehow invalidate past Reviews and break your business logic? No, because you copied the rule values to Review and so past Reviews will not change.
EDIT: Answers to specific questions
How do you see the duplicating? I can create an entity ReviewTemplate with the specified attributes. In this entity there will be a relationship with reviewlines and feedbackscores.
I see each ReviewTemplate as holding prototypical values for a particular 'type' of Review, which just might include a default reviewLine (but that might not make sense) and a default feedbackScore. When you create the Review, you would do the following:
Instantiate the Review and populate with values from ReviewTemplate
Instantiate as many CustomerReview objects as you need, linking them to the relevant Customer objects (I infer this step from your previous comments. It might also make sense to omit this step until a Customer voluntarily elects to review a Course)
(If appropriate) Populate the CustomerReview attribute feedbackScore with the default value from ReviewTemplate
Instantiate CustomerReviewLine records as appropriate
If you follow this approach, you do not need to add a relationship between ReviewTemplate and CustomerReviewLines.
When I e.g. state that customers 1 to 4 need to fill in the review 4 specific "objects" need to be created that will hold the information and also 4 sets of the needed reviewlines and feedbackscores need to be created so they all can hold the information.
Absolutely.
I just don't know how to implement this is a JPA structure so the information is hold in the db ... ?
JPA allows you to attack the problem in many ways, but the best practice is to manually create both the DB schema and the Java classes (eg see https://stackoverflow.com/a/2585763/1395668). Therefore, for each entity in the diagram, you need to:
Write SQL DDL statements to create the table, columns, primary key and foreign keys, and
Write a Java class denoted with the #entity annotation. Within the class, you will also need to annotate the id (primary key) with #id and the relationships with #OneToMany or #ManyToOne (theirs additional parameters in the annotation to set as well).
Now, on the JPA side, you can do things like:
ReviewTemplate template = course.getReviewTemplate(); //assuming the variable course
Review review = new Review();
review.setCourse(course);
review.setRuleOne(template.getRuleOne());
// Copy other properties here
EntityManager em = // get the entity manager here
em.persist(review);
// Assume a set or list of customers
for (Customer customer : customers) {
CustomerReview cr = new CustomerReview();
cr.setReview(review);
cr.setCustomer(customer);
cr.setFeedbackScore(template.getDefaultFeedbackScore());
// set other CustomerReview properties here
em.persist(cr);
// You can create CustomerReviewLine here as well
If written inside a standard EJB Session Bean, this will all be nicely transacted, and you will have all your new records committed into the DB.
EDIT 2: Additional question
(I'm assuming that the second comment completely supersedes the first)
So when I create a reviewtemplate and I link it to a bunch of customers I write the template to the db and create a bunch of reviews based on the template but linked to the specific customer and with his own unique reviewlines and feedbackscores. Like I see it now the reviewline (more a question or discription) is the same for each review (of a template), it is only the score that changes between the customers
I finally think I understand ReviewLine. I had thought it a place where the Customer enters lines of text the comprise the CustomerReview. I now believe that ReviewLine is a specific question that the Customer is asked, and which the Customer provides a feedbackScore.
With this understanding, here is an updated ER/Class diagram.
Note that there are some significant changes - there are several more tables:
ReviewLineTemplate provides a place for template questions to be stored on a ReviewTemplate
When a Review is instantiated/inserted (which is a copy of a specific ReviewTemplate), the ReviewLineTemplates are copied as ReviewLines. The copy operation allows two important features:
On creation, a Review and its ReviewLines can be customized without affecting the ReviewTemplate or ReviewLineTemplate
Over time, the ReviewTemplate and ReviewLineTemplate can be updated, edited and continually improved, without changing the questions that the Customer has already answered. If CustomerFeedbackScore were linked to ReviewLineTemplate directly, then editing the ReviewLineTemplate would change the question that the Customer has answered, silently invalidating the feedbackScore.
FeedbackScore has been moved to a join-table between ReviewLine and CustomerReview.
Note that this model is fully denormalised which makes it more 'correct' but harder to build a GUI for. A common 'optimization' might be to introduce:
10 (say) columns on ReviewTemplate and Review called reviewLine1 through reviewLine10.
10 (say) columns on CustomerReview called feedbackScore1 through feedbackScore10.
Remove the ReviewTemplateLine, ReviewLine and CustomerReviewLine tables
Doing so is not normalised, and may introduce a set of other problems. YMMV
The structure of data always depends on the requirements, and there never exists a "one-and-only" solution. So, do you need maximised atomiticy or a high performance data system?
The fastest and easiest solution would be not using a database, but hash tables. In your case, you could have something like 3 hash tables for customer, review, and probably another one for the n:n relationship. Or if you're using a database, you could just store an array of the review-primary-keys in one field in the customer table.
However, we all learn in school to do atomicity, so let's do that (I just write the primary/foreign keys!):
Customer (unique_ID, ...)
Review (unique_ID, ...)
Customer_Review (customer_ID, review_ID, ...) --> n:n-relationship
The Customer_Review describes the n:n-relationship between customers and reviews. But if there is only one customer per review possible, you'll do that like this:
Customer (unique_ID, ...)
Review (pk: unique_ID, fk: customer_ID, ...) --> 1:n-relationship
However, I suggest you need to learn ERM as a good starting point: http://en.wikipedia.org/wiki/Entity_relationship_model
You need a ManyToMany relation :
One customer -> several reviews.
One review -> several customers.
So you will have 3 tables in your database schema : Customer, review and a junction table with the customer ID and the review ID.
See Wikipedia : Many to Many

Variable structure in a DB table to be read in a Web form

I have a "variable" structure to be put in a table DB. By "variable" I mean a sequence of couples field/value in which the "kind" of field determines the value type, I don't know exactly field order and I don't know how many times fields can repeat. Sometimes group of fields will repeat several times (it is a fiscal model).
Additional requirement: I should map these variable data into web page forms, handling some CRUD work.
JQuery-ui, Struts 2, Hibernate. Preferred DBMS: MySQL.
The solutions I thought of:
vertical table. I could have some performance issue, which I could resolve with materialized views that "pivot" the rows in columns when I need massive data process. Not gone so far in this direction as it seems to be very expensive for development.
LOB fields. Pack my columns into one of those, perhaps having a "mapping" table to decode each column. My idea is to pull-out searchable fields as "real" columns in order to leave in the LOB just the less interesting mob of data and not to generate performance problems.
or better 2a. Use an xml inside the LOB field. This could be useful to pack/unpack data more comfortably, specially having to map data to a web form.
What do you think? And more, is there some way to create automatic views from xml fields? Or better to map such data to web form? I suspect Hibernate Tools won't work in any of the cases I described.
I hope I have been clear, it's still a bit confusing even to me :)
Your option 1 is the Entity-Attribute-Value antipattern.
See my answer to Product table, many kinds of product, each product has many parameters and my blog post EAV FAIL for alternatives and some reasons why EAV is wrong, at least for a relational database (I cover EAV in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming).
Also read this article about how a similar structure nearly doomed a company: Bad CaRMa.
Your options 2 & 3 are similar to described in How FriendFeed uses MySQL to store schema-less data. I don't know of any automatic way for an ORM to maintain that structure for you. You do have the chore of keeping your inverted index tables in sync with your LOB data.

Categories