My problem is more or less asked here Spring Batch : Compare Data Between Database however I still cannot get my head around it. Maybe it's a bit different.
I have A datasource and I want to write into database B.
I have full trust in A datasource, so if;
A Does contain the record that B does not, I have to add B.
A Does not contain the record that B does, I have to delete from B
A does contain, B does contain, I check and update the record in B accordingly.
I thought my approach would be simple as;
Read Person from A datasource
Read Person from B datasource
(Those two Person can be having different entities)
Compare and find the ones to Add,Update,Delete.
Update the database.
However since I am pretty newbie to Spring Batch, the implementation is kind of ending up to a spaggetti code which I don't want and want to learn the right way for it.
So;
I created this job below
#Bean
public Job job() {
return jobBuilderFactory
.get("myNewbieJob")
.start(populateARepository())
.next(populateBRepository())
.next(compareAndSubmitCountryRepositoriesTasklet())
.build();
}
To explain;
populateARepository() populateARepository() : I have a Repository object just contains a list. This step just does add records to the list.
The part that I don't like is that compareAndSubmitCountryRepositoriesTasklet() is basically comparing those repositories... and then I don't know what to do.
If I create a DB access and push from that class, I won't like it, because I just wanted it to be a step where I find the differences.
If I create another class which contains 3 separate lists for toUpdate,toDelete,toInsert, and then in the next step somehow use that repository... that sounded wrong to me as well.
So, here I am. Any kind of guidance is appreciated. How would you deal in this situation?
Thank you in advance.
Before talking about Spring Batch, I would first look for an algorithm to solve this problem. If I understand correctly, you basically need to replicate the same state of records in database A into database B. What you can do is:
Read Person items from database A
Use an item processor to do the comparison with table B. Here, you would mark the item accordingly to be inserted, updated or deleted
Use an item writer that checks the type of record and do the necessary operation. Here, you can create a custom writer or use a ClassifierCompositeItemWriter (see this example)
This approach works well with small/medium datasets but not for large datasets (due to the additional query for each item, but this is inherent to the algorithm itself and not the implementation with Spring Batch).
Related
When update an existing field in the table, what is the best practice to consider? Like I do write a lot of PUT method which overwrite the existing data when update ( I consider with large objects using a lot of logic and also calculation, it'd be better).
But what about sometimes I just need to update one field? I saw a pattern of using same or similar put method when people update data, in those function, they will check to delete the old one and put the new data into table.
Should it be better if we using an update set JPA query to change exactly what we need?
Like there's an existing function when user create or update the record they have, should I create then add one update set query inside that function so it would update or create into those new set table I create?
Thanks for reading this load of text, I hope to have a clear point of view to become a better programmer.
I did try to google to understand the definition but not so clear. I hope I can get a clearer point of view here.
I'm looking for the best practice/ design pattern to permit changes on data in a database.
The requirement is that a person A want to update some data. For example email, address or company name. All the changes from person A are not visible on the webpage until Person B check these changes and confirm that. The data are stored in a database.
My question is now what is the best practice/ design pattern for the database? Duplicate the database and copy the data by commit to the other? Only one database and copy the whole dataset with the changed value and tag in that extra column(has to check).
I tried to find something with Google, but I think I don't use the right buzzwords.
My only buzzword was four eyes principal and a solution was workflow engine like camunda or to use dms or ecm.
There has to be simple solution for that problem or is that problem so uncommon?
Thanks for help.
PS.: the user change the data on a website, not directly in the database.
You have to design a data structure such as "generic order" or "generic job". Here you will have a master table that describes a job or an order. The fields would be:
job_id
created_at
created_by
job_ident
job_name
permit_at
permit_by
Any job/order has a key/value set of data. Which would be modeled like this:
job_id (fk_job_id)
key
value
Every job must have an implementation. So the interpretation of the key value pairs is up to this implementation. The implementation would be triggered by a user as he permits the changes associated with the current job or order.
The generic model allows you to have one data structure regardless of any table your application has.
You can also draw some inspiration from transaction logs of a database. Please note, that changes have an order to it. So you can detect conflicts.
However you can enhance the data structure above to lock an entity until approval.
You can also have a look on spring-batch where there is a similar data structure and processing.
Please see the apache camel project. Maybe you can use apache-camel as well.
For an excercise I need to build something like :
For a course I need to create a review that is made up out of certain reviewlines and feedbackscores.
This review object (unique instance) needs to be filled in by a list of customers.
Depending on the course the review is for, the review will change (e.g.for one course the number of reviewlines and feedbackscores will change). Each customer can be enrolled in more then one course and each review is specific for him.
Now how do I need to see the relationsship between "review" object (unique instance) and "customer" if I want to use JPA to save this all to the db?
A customer can have more then one review he/she needs to fill in.
A certain review object needs to be filled in by many customers (but this is a review object with a certain build [reviewlines and feedbackscores]) and unique for him.
Maybe I see it to complex but what is the best way to build this?
Try the following:
I think it's covered all your design points.
I am trying to read between the lines of your comments, and I think you want to implement a system where you capture a number of 'rules' for the Review (I'm guessing, but examples may be that reviews can be up to n lines, there must be at least m CustomerReviews before the Review gains a degree of quality). If this is indeed the case, I have created a ReviewTemplate class:
ReviewTemplate would have attributes/columns for each of value you would need. These attributes/columns are duplicated on Review
Populate ReviewTemplate with a number of rows, then create a row in Course and link it to one ReviewTemplate
When a Course needs a Review, copy the fields from the ReviewTemplate into the Review
In Java, implement the business rules for Review using the copied values - not the values on ReviewTemplate.
Why copy the values? Well, I bet that at some point, users want to edit the ReviewTemplate table. If so, what happens to the Review objects using the edited ReviewTemplates? Does the modified value on ReviewTemplate somehow invalidate past Reviews and break your business logic? No, because you copied the rule values to Review and so past Reviews will not change.
EDIT: Answers to specific questions
How do you see the duplicating? I can create an entity ReviewTemplate with the specified attributes. In this entity there will be a relationship with reviewlines and feedbackscores.
I see each ReviewTemplate as holding prototypical values for a particular 'type' of Review, which just might include a default reviewLine (but that might not make sense) and a default feedbackScore. When you create the Review, you would do the following:
Instantiate the Review and populate with values from ReviewTemplate
Instantiate as many CustomerReview objects as you need, linking them to the relevant Customer objects (I infer this step from your previous comments. It might also make sense to omit this step until a Customer voluntarily elects to review a Course)
(If appropriate) Populate the CustomerReview attribute feedbackScore with the default value from ReviewTemplate
Instantiate CustomerReviewLine records as appropriate
If you follow this approach, you do not need to add a relationship between ReviewTemplate and CustomerReviewLines.
When I e.g. state that customers 1 to 4 need to fill in the review 4 specific "objects" need to be created that will hold the information and also 4 sets of the needed reviewlines and feedbackscores need to be created so they all can hold the information.
Absolutely.
I just don't know how to implement this is a JPA structure so the information is hold in the db ... ?
JPA allows you to attack the problem in many ways, but the best practice is to manually create both the DB schema and the Java classes (eg see https://stackoverflow.com/a/2585763/1395668). Therefore, for each entity in the diagram, you need to:
Write SQL DDL statements to create the table, columns, primary key and foreign keys, and
Write a Java class denoted with the #entity annotation. Within the class, you will also need to annotate the id (primary key) with #id and the relationships with #OneToMany or #ManyToOne (theirs additional parameters in the annotation to set as well).
Now, on the JPA side, you can do things like:
ReviewTemplate template = course.getReviewTemplate(); //assuming the variable course
Review review = new Review();
review.setCourse(course);
review.setRuleOne(template.getRuleOne());
// Copy other properties here
EntityManager em = // get the entity manager here
em.persist(review);
// Assume a set or list of customers
for (Customer customer : customers) {
CustomerReview cr = new CustomerReview();
cr.setReview(review);
cr.setCustomer(customer);
cr.setFeedbackScore(template.getDefaultFeedbackScore());
// set other CustomerReview properties here
em.persist(cr);
// You can create CustomerReviewLine here as well
If written inside a standard EJB Session Bean, this will all be nicely transacted, and you will have all your new records committed into the DB.
EDIT 2: Additional question
(I'm assuming that the second comment completely supersedes the first)
So when I create a reviewtemplate and I link it to a bunch of customers I write the template to the db and create a bunch of reviews based on the template but linked to the specific customer and with his own unique reviewlines and feedbackscores. Like I see it now the reviewline (more a question or discription) is the same for each review (of a template), it is only the score that changes between the customers
I finally think I understand ReviewLine. I had thought it a place where the Customer enters lines of text the comprise the CustomerReview. I now believe that ReviewLine is a specific question that the Customer is asked, and which the Customer provides a feedbackScore.
With this understanding, here is an updated ER/Class diagram.
Note that there are some significant changes - there are several more tables:
ReviewLineTemplate provides a place for template questions to be stored on a ReviewTemplate
When a Review is instantiated/inserted (which is a copy of a specific ReviewTemplate), the ReviewLineTemplates are copied as ReviewLines. The copy operation allows two important features:
On creation, a Review and its ReviewLines can be customized without affecting the ReviewTemplate or ReviewLineTemplate
Over time, the ReviewTemplate and ReviewLineTemplate can be updated, edited and continually improved, without changing the questions that the Customer has already answered. If CustomerFeedbackScore were linked to ReviewLineTemplate directly, then editing the ReviewLineTemplate would change the question that the Customer has answered, silently invalidating the feedbackScore.
FeedbackScore has been moved to a join-table between ReviewLine and CustomerReview.
Note that this model is fully denormalised which makes it more 'correct' but harder to build a GUI for. A common 'optimization' might be to introduce:
10 (say) columns on ReviewTemplate and Review called reviewLine1 through reviewLine10.
10 (say) columns on CustomerReview called feedbackScore1 through feedbackScore10.
Remove the ReviewTemplateLine, ReviewLine and CustomerReviewLine tables
Doing so is not normalised, and may introduce a set of other problems. YMMV
The structure of data always depends on the requirements, and there never exists a "one-and-only" solution. So, do you need maximised atomiticy or a high performance data system?
The fastest and easiest solution would be not using a database, but hash tables. In your case, you could have something like 3 hash tables for customer, review, and probably another one for the n:n relationship. Or if you're using a database, you could just store an array of the review-primary-keys in one field in the customer table.
However, we all learn in school to do atomicity, so let's do that (I just write the primary/foreign keys!):
Customer (unique_ID, ...)
Review (unique_ID, ...)
Customer_Review (customer_ID, review_ID, ...) --> n:n-relationship
The Customer_Review describes the n:n-relationship between customers and reviews. But if there is only one customer per review possible, you'll do that like this:
Customer (unique_ID, ...)
Review (pk: unique_ID, fk: customer_ID, ...) --> 1:n-relationship
However, I suggest you need to learn ERM as a good starting point: http://en.wikipedia.org/wiki/Entity_relationship_model
You need a ManyToMany relation :
One customer -> several reviews.
One review -> several customers.
So you will have 3 tables in your database schema : Customer, review and a junction table with the customer ID and the review ID.
See Wikipedia : Many to Many
Sorry in advance if someone has already answered this specific question but I have yet to find an answer to my problem so here goes.
I am working on an application (no I cannot give the code as it is for a job so I'm sorry about that one) which uses DAO's and Hibernate and POJO's and all that stuff for communicating and writing to the database. This works well for the application assuming I don't have a ton of data to check when I call Session.flush(). That being said, there is a page where a user can add any number of items to a product and there is one particular case where there are something along the lines of 25 items. Each item has about 8 fields a piece that are all stored in the database. When I call the flush it does save everything to the database but it takes FOREVER to complete. The three lines I am calling are:
merge(myObject);
Session.flush();
Session.refresh(myObject);
I have tried a number of different combinations of things to fix this problem and a number of different solutions so coming back and saying "Don't use flus()" isn't much help as the saveOrUpdate() and other hibernate sessions don't seem to work. The only solution I can think of is to scrap the entire project (the code we got was inherited and poorly written to say the least) or tell the user community to suck it up.
It is my understanding from Hibernate API that if you want to write the data to the database it runs a check on every item, if there is a difference it creates a queue of update queries, then runs the queries. It seems as though this data is being updated every time because the "DATE_CREATED" column in my database is different even if the other values are unchanged.
What I was wondering is if there was another way to prevent such a large committing of data or a way of excluding that particular column from the "check" hibernate does so I don't have to commit all 25 items if I only made a change to 1?
Thanks in advance.
Mike
Well, you really cannot avoid the dirty checking in hibernate unless you use a StatelessSession. Of course, you lose a lot of features (lazy-load etc.) with that, but it's up to you to make this decision.
Another option: I would definitely try to use dynamic-update=true in your entity. Like:
#Entity(dynamicUpdate = true)
class MyClass
Using that, Hibernate will update the modified columns only. In small tables, with few columns, it's not so effective, but in your case maybe it can help make the whole process faster as you cannot avoid dirty checking with a regular Hibernate Session. Updating a few columns instead of the whole object is always better, right?
This post talks more about dynamic-update attribute.
What I was wondering is if there was another way to prevent such a
large committing of data or a way of excluding that particular column
from the "check" hibernate does so I don't have to commit all 25 items
if I only made a change to 1?
I would profile the application to ensure that the dirty checking on flush is actually the problem. If you find that this is indeed the case you can use evict to manage the session size.
session.update(myObject);
session.flush();
session.evict(myObject);
I need the sample program in Java for keeping the history of table if user inserted, updated and deleted on that table. Can anybody help in this?
Thanks in advance.
If you are working with Hibernate you can use Envers to solve this problem.
You have two options for this:
Let the database handle this automatically using triggers. I don't know what database you're using but all of them support triggers that you can use for this.
Write code in your program that does something similar when inserting, updating and deleting a user.
Personally, I prefer the first option. It probably requires less maintenance. There may be multiple places where you update a user, all those places need the code to update the other table. Besides, in the database you have more options for specifying required values and integrity constraints.
Well, we normally have our own history tables which (mostly) look like the original table. Since most of our tables already have the creation date, modification date and the respective users, all we need to do is copy the dataset from the live table to the history table with a creation date of now().
We're using Hibernate so this could be done in an interceptor, but there may be other options as well, e.g. some database trigger executing a script, etc.
How is this a Java question?
This should be moved in Database section.
You need to create a history table. Then create database triggers on the original table for "create or replace trigger before insert or update or delete on table for each row ...."
I think this can be achieved by creating a trigger in the sql-server.
you can create the TRIGGER as follows:
Syntax:
CREATE TRIGGER trigger_name
{BEFORE | AFTER } {INSERT | UPDATE |
DELETE } ON table_name FOR EACH ROW
triggered_statement
you'll have to create 2 triggers one for before the operation is performed and another after the operation is performed.
otherwise it can be achieved through code also but it would be a bit tedious for the code to handle in case of batch processes.
You should try using triggers. You can have a separate table (exact replica of your table of which you need to maintain history) .
This table will then be updated by trigger after every insert/update/delete on your main table.
Then you can write your java code to get these changes from the second history table.
I think you can use the redo log of your underlying database to keep track of the operation performed. Is there any particular reason to go for the program?
You could try creating say a List of the objects from the table (Assuming you have objects for the data). Which will allow you to loop through the list and compare to the current data in the table? You will then be able to see if any changes occurred.
You can even create another list with a object that contains an enumerator that gives you the action (DELETE, UPDATE, CREATE) along with the new data.
Haven't done this before, just a idea.
Like #Ashish mentioned, triggers can be used to insert into a seperate table - this is commonly referred as Audit-Trail table or audit log table.
Below are columns generally defined in such audit trail table : 'Action' (insert,update,delete) , tablename (table into which it was inserted/deleted/updated), key (primary key of that table on need basis) , timestamp (the time at which this action was done)
It is better to audit-log after the entire transaction is through. If not, in case of exception being passed back to code-side, seperate call to update audit tables will be needed. Hope this helps.
If you are talking about db tables you may use either triggers in db or add some extra code within your application - probably using aspects. If you are using JPA you may use entity listeners or perform some extra logic adding some aspect to your DAO object and apply specific aspect to all DAOs which perform CRUD on entities that needs to sustain historical data. If your DAO object is stateless bean you may use Interceptor to achive that in other case use java proxy functionality, cglib or other lib that may provide aspect functionality for you. If you are using Spring instead of EJB you may advise your DAOs within application context config file.
Triggers are not suggestable, when I stored my audit data in file else I didn't use the database...my suggestion is create table "AUDIT" and write java code with help of servlets and store the data in file or DB or another DB also ...