How to check in hibernate that same data with different id exists? - java

As given in the below code I am inserting data in table by simple hibernate code.
But when all the fields are then also it is saving data in table but only id is changed
(auto-incremented). I want to know is there any way to know that the same data exists in table & I can save redundant inserts. Please tell an easy way I know I can write queries anytime to figure out the same.
List<Route> listRoute = newList();
listRoute.get(0).setSource("DelhiTest");
listRoute.get(0).setDestination("KotaTest");
listRoute.get(1).setSource("DelhiTest");
listRoute.get(1).setDestination("KotaTest");
listRoute.get(2).setSource("DelhiTest");
listRoute.get(2).setDestination("KotaTest");
RowReferenceByEntity.setListRoute(listRoute);
new RouteDAOImpl().saveOnly(listRoute.get(0));
new RouteDAOImpl().saveOnly(listRoute.get(1));
new RouteDAOImpl().saveOnly(listRoute.get(2));
Thanks

Searching for duplicate rows already persisted can lead to performance issues. For each new item you ill need to search a match in that table.
To guarantee routes are unique you can create a unique index covering source and destination ids. Any attempt to duplicate data ill throw an exception. and this ill work fast.
But if all you want is to just find duplicate items in a list look at this post.
How to find duplicate items in list

This query counts how many ids have the same value:
SELECT COUNT(id), value
FROM table
GROUP BY value

Related

Bulk Insert into Postgres SQL from a List<Object> - JAVA

I am trying to do a bulk insert of a list of objects (Entity of table). I currently have it implemented like:
for(BPSPositionTable position : bpsPositionList){
if(!position.getCorrespondingMemos().isEmpty()){
memoRepo.saveAll(position.getCorrespondingMemos());
}
}
I know I shouldn't be running insert statements in a loop like this so I was wondering if there is some sort of JPA magic that can help me with this. I have looked at the #OneToMany annotation, but I'm not sure that will alleviate my problem.
The relationship to the databases is -- One To Many (Position --> Memos)
Position has a Transient field called correspondingMemo's which contains a list of memos. Memo contains some overlap from the position fields (same fields with same values, don't ask why just how it got designed), but there is a foreign_key on the Positions id. I was wondering if there is a simplified way of accomplishing so I don't need to loop through the list of positions in order to persist each ones memo's.
Since memo foreign key is dependent on it having a match in position table positions must be persisted first. I'd like a way to grab all of the correspondingMemos from each position and persist them into their own database.
Please post in the comments if you need additional information.

how to suppress mysql error java

I'm trying to automate the addition of links from an RSS feed to a mySQL table. I want to take input from the feed every hour, but there are usually links that I have already grabbed still present on the RSS page when I reference it again.
I've got java code that works the first time, but when I try to add a duplicate file to the sql table I get an exception. I thought that mySQL would just ignore and pass over the duplicates, as I'm using the MUL key on the field name in question, but I get an exception instead.
Any ideas on how to get this rolling? I don't want duplicates, and I don't want duplicates to stop other new things from being added.
Thanks!
You can use REPLACE INTO instead of INSERT INTO:
REPLACE works exactly like INSERT, except that if an old row in the
table has the same value as a new row for a PRIMARY KEY or a UNIQUE
index, the old row is deleted before the new row is inserted.
http://dev.mysql.com/doc/refman/5.0/en/replace.html
You can also use INSERT IGNORE INTO ... if you want to keep the old value instead of replacing it.

Avoiding for loop and try to utilize collection APIs instead (performance)

I have a piece of code from an old project.
The logic (in a high level) is as follows:
The user sends a series of {id,Xi} where id is the primary key of the object in the database.
The aim is that the database is updated but the series of Xi values is always unique.
I.e. if the user sends {1,X1} and in the database we have {1,X2},{2,X1} the input should be rejected otherwise we end up with duplicates i.e. {1,X1},{2,X1} i.e. we have X1 twice in different rows.
In lower level the user sends a series of custom objects that encapsulate this information.
Currently the implementation for this uses "brute-force" i.e. continuous for-loops over input and jdbc resultset to ensure uniqueness.
I do not like this approach and moreover the actual implementation has subtle bugs but this is another story.
I am searching for a better approach, both in terms of coding and performance.
What I was thinking is the following:
Create a Set from the user's input list. If the Set has different size than list, then user's input has duplicates.Stop there.
Load data from jdbc.
Create a HashMap<Long,String> with the user's input. The key is the primary key.
Loop over result set. If HashMap does not contain a key with the same value as ResultSet's row id then add it to HashMap
In the end get HashMap's values as a List.If it contains duplicates reject input.
This is the algorithm I came up.
Is there a better approach than this? (I assume that I am not erroneous on the algorithm it self)
Purely from performance point of view , why not let the database figure out that there are duplicates ( like {1,X1},{2,X1} ) ? Have a unique constraint in place in the table and then when the update statement fails by throwing the exception , catch it and deal with what you would want to do under these input conditions. You may also want to run this as a single transaction just if you need to rollback any partial updates. Ofcourse this is assuming that you dont have any other business rules driving the updates that you havent mentioned here.
With your algorithm , you are spending too much time iterating over HashMaps and Lists to remove duplicates IMHO.
Since you can't change the database, as stated in the comments. I would probably extend out your Set idea. Create a HashMap<Long, String> and put all of the items from the database in it, then also create a HashSet<String> with all of the values from your database in it.
Then as you go through the user input, check the key against the hashmap and see if the values are the same, if they are, then great you don't have to do anything because that exact input is already in your database.
If they aren't the same then check the value against the HashSet to see if it already exists. If it does then you have a duplicate.
Should perform much better than a loop.
Edit:
For multiple updates perform all of the updates on the HashMap created from your database then once again check the Map's value set to see if its' size is different from the key set.
There might be a better way to do this, but this is the best I got.
I'd opt for a database-side solution. Assuming a table with the columns id and value, you should make a list with all the "values", and use the following SQL:
select count(*) from tbl where value in (:values);
binding the :values parameter to the list of values however is appropriate for your environment. (Trivial when using Spring JDBC and a database that supports the in operator, less so for lesser setups. As a last resort you can generate the SQL dynamically.) You will get a result set with one row and one column of a numeric type. If it's 0, you can then insert the new data; if it's 1, report a constraint violation. (If it's anything else you have a whole new problem.)
If you need to check for every item in the user input, change the query to:
select value from tbl where value in (:values)
store the result in a set (called e.g. duplicates), and then loop over the user input items and check whether the value of the current item is in duplicates.
This should perform better than snarfing the entire dataset into memory.

Retrieving data from ArrayList which contains database rows

I have retrived some datas from DB and I have stored it in an ArrayList. The ArrayList contains some 50 rows returned each row containin 4 columns. How do I access a particular column of a particular object in ArrayList? Can someone help me with this?
Not sure what is the exact issue here. List is based on the index and hence you can access any data based on index. Another option is to convert use Map which allows you to refer to the data based on a key you desire.
Due to new data posted in comments on the question, this is now know to not be what OP wants. I'd delete it but, given what I've read from him/her so far, I'm afraid he/she may be forever flummoxed by the disappearance of an answer.
ORIGINAL ANSWER
If you really have an ArrayList and not a ResultSet then do this
myList.get( desiredRow*column_width /*4*/ + desiredCol);
This assumes row-major ordering.

Insert fail then update OR Load and then decide if insert or update

I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.
If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194
If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.
The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.
MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1
Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.

Categories