Bulk Insert into Postgres SQL from a List<Object> - JAVA

Bulk Insert into Postgres SQL from a List<Object> - JAVA - java

I am trying to do a bulk insert of a list of objects (Entity of table). I currently have it implemented like:
for(BPSPositionTable position : bpsPositionList){
if(!position.getCorrespondingMemos().isEmpty()){
memoRepo.saveAll(position.getCorrespondingMemos());
}
}
I know I shouldn't be running insert statements in a loop like this so I was wondering if there is some sort of JPA magic that can help me with this. I have looked at the #OneToMany annotation, but I'm not sure that will alleviate my problem.
The relationship to the databases is -- One To Many (Position --> Memos)
Position has a Transient field called correspondingMemo's which contains a list of memos. Memo contains some overlap from the position fields (same fields with same values, don't ask why just how it got designed), but there is a foreign_key on the Positions id. I was wondering if there is a simplified way of accomplishing so I don't need to loop through the list of positions in order to persist each ones memo's.
Since memo foreign key is dependent on it having a match in position table positions must be persisted first. I'd like a way to grab all of the correspondingMemos from each position and persist them into their own database.
Please post in the comments if you need additional information.

Related

Trying to properly Save/Load an Array via MySQL

So I have an array of PlayerNames that are in a specific 'clan' that I need to save, so when I load the server it will loop through all of the entries.
Here is what I currently have
MySQL Table
Not sure what I would put for the 'members' basically what I want is to store UUID's of an array I have. It can come out as a string but I just need them to be able to store like
members: uuid, uuid, uuid, uuid
I understand how to build the connection, ResultSet, and the Statement part, I just don't know how to make MySQL know that I am trying to save these list of members as an array. Any help would be appreciated, I apologize if I did something wrong.

The problem here is in your data structure. It's not Normalized where as RDBMS are designed to store normalized data. All Create/Update/Delete/Retrieve operations become a lot easier if the data is normalized.
To normalize your tables, you need to stop storing member ids in the same column as CSV.
The Gang table should
uuid
title
kills
I assume you already have a members table and it looks something like
uuid
member_name
Then you need a new table called gang_members
gang_id
member_id
And create a unique index on those two columns.
Also see https://stackoverflow.com/a/41215681/267540 , Is storing a delimited list in a database column really that bad? and https://stackoverflow.com/a/41305027/267540

How to check in hibernate that same data with different id exists?

As given in the below code I am inserting data in table by simple hibernate code.
But when all the fields are then also it is saving data in table but only id is changed
(auto-incremented). I want to know is there any way to know that the same data exists in table & I can save redundant inserts. Please tell an easy way I know I can write queries anytime to figure out the same.
List<Route> listRoute = newList();
listRoute.get(0).setSource("DelhiTest");
listRoute.get(0).setDestination("KotaTest");
listRoute.get(1).setSource("DelhiTest");
listRoute.get(1).setDestination("KotaTest");
listRoute.get(2).setSource("DelhiTest");
listRoute.get(2).setDestination("KotaTest");
RowReferenceByEntity.setListRoute(listRoute);
new RouteDAOImpl().saveOnly(listRoute.get(0));
new RouteDAOImpl().saveOnly(listRoute.get(1));
new RouteDAOImpl().saveOnly(listRoute.get(2));
Thanks

Searching for duplicate rows already persisted can lead to performance issues. For each new item you ill need to search a match in that table.
To guarantee routes are unique you can create a unique index covering source and destination ids. Any attempt to duplicate data ill throw an exception. and this ill work fast.
But if all you want is to just find duplicate items in a list look at this post.
How to find duplicate items in list

This query counts how many ids have the same value:
SELECT COUNT(id), value
FROM table
GROUP BY value

relationship and build database

For an excercise I need to build something like :
For a course I need to create a review that is made up out of certain reviewlines and feedbackscores.
This review object (unique instance) needs to be filled in by a list of customers.
Depending on the course the review is for, the review will change (e.g.for one course the number of reviewlines and feedbackscores will change). Each customer can be enrolled in more then one course and each review is specific for him.
Now how do I need to see the relationsship between "review" object (unique instance) and "customer" if I want to use JPA to save this all to the db?
A customer can have more then one review he/she needs to fill in.
A certain review object needs to be filled in by many customers (but this is a review object with a certain build [reviewlines and feedbackscores]) and unique for him.
Maybe I see it to complex but what is the best way to build this?

Try the following:
I think it's covered all your design points.
I am trying to read between the lines of your comments, and I think you want to implement a system where you capture a number of 'rules' for the Review (I'm guessing, but examples may be that reviews can be up to n lines, there must be at least m CustomerReviews before the Review gains a degree of quality). If this is indeed the case, I have created a ReviewTemplate class:
ReviewTemplate would have attributes/columns for each of value you would need. These attributes/columns are duplicated on Review
Populate ReviewTemplate with a number of rows, then create a row in Course and link it to one ReviewTemplate
When a Course needs a Review, copy the fields from the ReviewTemplate into the Review
In Java, implement the business rules for Review using the copied values - not the values on ReviewTemplate.
Why copy the values? Well, I bet that at some point, users want to edit the ReviewTemplate table. If so, what happens to the Review objects using the edited ReviewTemplates? Does the modified value on ReviewTemplate somehow invalidate past Reviews and break your business logic? No, because you copied the rule values to Review and so past Reviews will not change.
EDIT: Answers to specific questions
How do you see the duplicating? I can create an entity ReviewTemplate with the specified attributes. In this entity there will be a relationship with reviewlines and feedbackscores.
I see each ReviewTemplate as holding prototypical values for a particular 'type' of Review, which just might include a default reviewLine (but that might not make sense) and a default feedbackScore. When you create the Review, you would do the following:
Instantiate the Review and populate with values from ReviewTemplate
Instantiate as many CustomerReview objects as you need, linking them to the relevant Customer objects (I infer this step from your previous comments. It might also make sense to omit this step until a Customer voluntarily elects to review a Course)
(If appropriate) Populate the CustomerReview attribute feedbackScore with the default value from ReviewTemplate
Instantiate CustomerReviewLine records as appropriate
If you follow this approach, you do not need to add a relationship between ReviewTemplate and CustomerReviewLines.
When I e.g. state that customers 1 to 4 need to fill in the review 4 specific "objects" need to be created that will hold the information and also 4 sets of the needed reviewlines and feedbackscores need to be created so they all can hold the information.
Absolutely.
I just don't know how to implement this is a JPA structure so the information is hold in the db ... ?
JPA allows you to attack the problem in many ways, but the best practice is to manually create both the DB schema and the Java classes (eg see https://stackoverflow.com/a/2585763/1395668). Therefore, for each entity in the diagram, you need to:
Write SQL DDL statements to create the table, columns, primary key and foreign keys, and
Write a Java class denoted with the #entity annotation. Within the class, you will also need to annotate the id (primary key) with #id and the relationships with #OneToMany or #ManyToOne (theirs additional parameters in the annotation to set as well).
Now, on the JPA side, you can do things like:
ReviewTemplate template = course.getReviewTemplate(); //assuming the variable course
Review review = new Review();
review.setCourse(course);
review.setRuleOne(template.getRuleOne());
// Copy other properties here
EntityManager em = // get the entity manager here
em.persist(review);
// Assume a set or list of customers
for (Customer customer : customers) {
CustomerReview cr = new CustomerReview();
cr.setReview(review);
cr.setCustomer(customer);
cr.setFeedbackScore(template.getDefaultFeedbackScore());
// set other CustomerReview properties here
em.persist(cr);
// You can create CustomerReviewLine here as well
If written inside a standard EJB Session Bean, this will all be nicely transacted, and you will have all your new records committed into the DB.
EDIT 2: Additional question
(I'm assuming that the second comment completely supersedes the first)
So when I create a reviewtemplate and I link it to a bunch of customers I write the template to the db and create a bunch of reviews based on the template but linked to the specific customer and with his own unique reviewlines and feedbackscores. Like I see it now the reviewline (more a question or discription) is the same for each review (of a template), it is only the score that changes between the customers
I finally think I understand ReviewLine. I had thought it a place where the Customer enters lines of text the comprise the CustomerReview. I now believe that ReviewLine is a specific question that the Customer is asked, and which the Customer provides a feedbackScore.
With this understanding, here is an updated ER/Class diagram.
Note that there are some significant changes - there are several more tables:
ReviewLineTemplate provides a place for template questions to be stored on a ReviewTemplate
When a Review is instantiated/inserted (which is a copy of a specific ReviewTemplate), the ReviewLineTemplates are copied as ReviewLines. The copy operation allows two important features:
On creation, a Review and its ReviewLines can be customized without affecting the ReviewTemplate or ReviewLineTemplate
Over time, the ReviewTemplate and ReviewLineTemplate can be updated, edited and continually improved, without changing the questions that the Customer has already answered. If CustomerFeedbackScore were linked to ReviewLineTemplate directly, then editing the ReviewLineTemplate would change the question that the Customer has answered, silently invalidating the feedbackScore.
FeedbackScore has been moved to a join-table between ReviewLine and CustomerReview.
Note that this model is fully denormalised which makes it more 'correct' but harder to build a GUI for. A common 'optimization' might be to introduce:
10 (say) columns on ReviewTemplate and Review called reviewLine1 through reviewLine10.
10 (say) columns on CustomerReview called feedbackScore1 through feedbackScore10.
Remove the ReviewTemplateLine, ReviewLine and CustomerReviewLine tables
Doing so is not normalised, and may introduce a set of other problems. YMMV

The structure of data always depends on the requirements, and there never exists a "one-and-only" solution. So, do you need maximised atomiticy or a high performance data system?
The fastest and easiest solution would be not using a database, but hash tables. In your case, you could have something like 3 hash tables for customer, review, and probably another one for the n:n relationship. Or if you're using a database, you could just store an array of the review-primary-keys in one field in the customer table.
However, we all learn in school to do atomicity, so let's do that (I just write the primary/foreign keys!):
Customer (unique_ID, ...)
Review (unique_ID, ...)
Customer_Review (customer_ID, review_ID, ...) --> n:n-relationship
The Customer_Review describes the n:n-relationship between customers and reviews. But if there is only one customer per review possible, you'll do that like this:
Customer (unique_ID, ...)
Review (pk: unique_ID, fk: customer_ID, ...) --> 1:n-relationship
However, I suggest you need to learn ERM as a good starting point: http://en.wikipedia.org/wiki/Entity_relationship_model

You need a ManyToMany relation :
One customer -> several reviews.
One review -> several customers.
So you will have 3 tables in your database schema : Customer, review and a junction table with the customer ID and the review ID.
See Wikipedia : Many to Many

Avoiding for loop and try to utilize collection APIs instead (performance)

I have a piece of code from an old project.
The logic (in a high level) is as follows:
The user sends a series of {id,Xi} where id is the primary key of the object in the database.
The aim is that the database is updated but the series of Xi values is always unique.
I.e. if the user sends {1,X1} and in the database we have {1,X2},{2,X1} the input should be rejected otherwise we end up with duplicates i.e. {1,X1},{2,X1} i.e. we have X1 twice in different rows.
In lower level the user sends a series of custom objects that encapsulate this information.
Currently the implementation for this uses "brute-force" i.e. continuous for-loops over input and jdbc resultset to ensure uniqueness.
I do not like this approach and moreover the actual implementation has subtle bugs but this is another story.
I am searching for a better approach, both in terms of coding and performance.
What I was thinking is the following:
Create a Set from the user's input list. If the Set has different size than list, then user's input has duplicates.Stop there.
Load data from jdbc.
Create a HashMap<Long,String> with the user's input. The key is the primary key.
Loop over result set. If HashMap does not contain a key with the same value as ResultSet's row id then add it to HashMap
In the end get HashMap's values as a List.If it contains duplicates reject input.
This is the algorithm I came up.
Is there a better approach than this? (I assume that I am not erroneous on the algorithm it self)

Purely from performance point of view , why not let the database figure out that there are duplicates ( like {1,X1},{2,X1} ) ? Have a unique constraint in place in the table and then when the update statement fails by throwing the exception , catch it and deal with what you would want to do under these input conditions. You may also want to run this as a single transaction just if you need to rollback any partial updates. Ofcourse this is assuming that you dont have any other business rules driving the updates that you havent mentioned here.
With your algorithm , you are spending too much time iterating over HashMaps and Lists to remove duplicates IMHO.

Since you can't change the database, as stated in the comments. I would probably extend out your Set idea. Create a HashMap<Long, String> and put all of the items from the database in it, then also create a HashSet<String> with all of the values from your database in it.
Then as you go through the user input, check the key against the hashmap and see if the values are the same, if they are, then great you don't have to do anything because that exact input is already in your database.
If they aren't the same then check the value against the HashSet to see if it already exists. If it does then you have a duplicate.
Should perform much better than a loop.
Edit:
For multiple updates perform all of the updates on the HashMap created from your database then once again check the Map's value set to see if its' size is different from the key set.
There might be a better way to do this, but this is the best I got.

I'd opt for a database-side solution. Assuming a table with the columns id and value, you should make a list with all the "values", and use the following SQL:
select count(*) from tbl where value in (:values);
binding the :values parameter to the list of values however is appropriate for your environment. (Trivial when using Spring JDBC and a database that supports the in operator, less so for lesser setups. As a last resort you can generate the SQL dynamically.) You will get a result set with one row and one column of a numeric type. If it's 0, you can then insert the new data; if it's 1, report a constraint violation. (If it's anything else you have a whole new problem.)
If you need to check for every item in the user input, change the query to:
select value from tbl where value in (:values)
store the result in a set (called e.g. duplicates), and then loop over the user input items and check whether the value of the current item is in duplicates.
This should perform better than snarfing the entire dataset into memory.

Insert fail then update OR Load and then decide if insert or update

I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.

If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194

If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.

The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.

MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1

Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.