Upsert in Spring Data - java

There is a table
columns: id(pk), name, attribute
unique constraint on (name, attribute).
There are a bunch of threads which insert in the table if a record is not there. Spring Data is used for that and it's done it a transaction which could take some time. The records could be the same, meaning same (name, attribute), simultaneously in a couple of threads. From time to time race condition happens, thread A tries to commit a new record whereas thread b committed the same before thread A read it.
Are there any approaches on how to do upsert in this kind of situations?
Perhaps, there are other suggestions to resolve this issue, would be happy to hear them.

Either do it the JPA way:
Try to find the entity, if it is not there, save it.
If it is there there is nothing to do, but of course you could update it by manipulating the found entity.
Alternatively go SQL and write an actual upsert/merge statement which many database dialects support.

Related

Get identity after Instead of insert trigger

I am using Hibernate with MSSQL server writing the software that integrates with an existing database. There is an instead of insert trigger on the table that I need to insert into and it messes up ##Identity, which means on Hibernate's save I can't get the id of inserted row. I can't control the trigger (can't modify it). I saw this question, but it involves procedures, which my trigger does not have, so I thought my question is different enough. I can't post the whole trigger, but hopefully I can post enough to get the point across:
CREATE TRIGGER TrigName ON TableName
INSTEAD OF INSERT
AS
SET XACT_ABORT ON
BEGIN TRANSACTION
-- several DECLARE, SET statements
-- a couple of inserts into other tables for business logic
-- plain T-SQL statements without procedures or functions
...
-- this is the actual insert that i need to perform
-- to be honest, I don't quite understand how INSERTED table
-- was filled with all necessary columns by this point, but for now
-- I accept it as is (I am no SQL pro...)
INSERT INTO ClientTable (<columns>)
SELECT <same columns> from INSERTED
-- a couple of UPDATE queries to unrelated tables
...
COMMIT TRANSACTION;
I was wondering if there is a reliable way to get the id of the row being inserted? One solution I thought of and tried to make is to install an on insert trigger on the same table that writes the newly inserted row into a new table I added to the db. I'd use that table as a queue. After transaction commit in Hibernate I could go into that table and run a select with the info I just inserted (I still have access to it from the same method scope), and I can get the id and finally remove that row. This is a bulky solution, but best I can come up with so far.
Would really appreciate some help. I can't modify existing triggers and procedures, but I can add something to the db if it absolutely does not affect existing logic (like that new table and a on insert trigger).
To sum up: I need to find a way to get the ID of the row I just inserted with Hibernate's save call. Because of that instead of insert trigger, hibernate always returns identity=0. I need to find a way to get that ID because I need to do the insert in a few other tables during one transaction.
I think I found an answer for my question. To reply to #SeanLange's comment: I can't actually edit insert code - it's done by another application and inquiry to change that will take too long (or won't happen - it's a legacy application). What I did is insert another trigger on insert on the same table. Since I know the order of operations in the existing instead of insert trigger I can see that the last insert operation will be in the table I want so that means my on insert trigger will fire right after that. In the scope of that trigger I have access to inserted table out of which I pull out the id.
CREATE TRIGGER Client_OnInsert ON myClientTable
FOR INSERT
AS
BEGIN
DECLARE #ID int;
SET #ID = (select ClientID from inserted);
INSERT INTO ModClient (modClientId)
OUTPUT #ID
VALUES (#ID);
END
GO
Then in Hibernate (since I can't use save() anymore), I use a NativeQuery to do this insert. I set parameters and run the list() method of NativeQuery, which returns a List where the first and only argument is the id I want.
This is a bulky way, I know. If there is anything that's really bad that will stand out to people - please let me know. I would really appreciate some feedback on this. However, I wanted to post this answer as a potential answer that worked so far, but it does not mean it's very good. For this solution to work I did have to create another small table ModClient, which I will have to use as a temp id storage for this exact purpose.

Prevent violating of UNIQUE constraint with Hibernate

I have a table like (id INTEGER, sometext VARCHAR(255), ....) with id as the primary key and a UNIQUE constraint on sometext. It gets used in a web server, where a request needs to find the id corresponding to a given sometext if it exists, otherwise a new row gets inserted.
This is the only operation on this table. There are no updates and no other operations on this table. Its sole purpose is to persistently number of encountered values of sometext. This means that I can't drop the id and use sometext as the PK.
I do the following:
First, I consult my own cache in order to avoid any DB access. Nearly always, this works and I'm done.
Otherwise, I use Hibernate Criteria to find the row by sometext. Usually, this works and again, I'm done.
Otherwise, I need to insert a new row.
This works fine, except when there are two overlapping requests with the same sometext. Then an ConstraintViolationException results. I'd need something like INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE (Mysql syntax) or MERGE (Firebird syntax).
I wonder what are the options?
AFAIK Hibernate merge works on PK only, so it's inappropriate. I guess, a native query might help or not, as it may or may not be committed when the second INSERT takes place.
Just let the database handle the concurrency. Start a secondary transaction purely for inserting the new row. if it fails with a ConstraintViolationException, just roll that transaction back and read the new row.
Not sure this scales well if the likelihood of a duplicate is high, a lot of extra work if some percent (depends on database) of transactions have to fail the insert and then reselect.
A secondary transaction minimizes the length of time the transaction to add the new text takes, assuming the database supports it correctly, it might be possible for the thread 1 transaction to cause the thread 2 select/insert to hang until the thread 1 transaction is committed or rolled back. Overall database design might also affect transaction throughput.
I don't necessarily question why sometext can't be a PK, wondering why you need to break it out at all. Of course, large volumes might substantially save space if sometext records are large, it almost seems like you're trying to emulate a lucene index to give you a complete list of text values.

Hibernate Select on merge() seems useless

I have a single table i am trying to understand the logic behind session.merge but i think is in somehow useless i will try to explain more with some code.
public static void main(String[] args)
{
final Merge clazz = new Merge();
Ragonvalia ragonvalia = clazz.load();//LOADED FROM DATABASE...
System.out.println("ORIGINAL: "+ragonvalia);
//Prints c02=1953
clazz.session.evict(ragonvalia);//WE EVICT HERE FOR FORCE MERGE RELOAD FROM DB
//HERE I MAKE SOME MODIFICATIONS TO THE RECORD IN THE DB DIRECTLY.....
try{Thread.sleep(20000);}catch(final Exception e){e.printStackTrace();}
//now c02=2000
final Ragonvalia merge = ragonvalia = (Ragonvalia)clazz.session.merge(ragonvalia);//MERGE IN FACT THE SELECT IS THROWN
System.out.println("MERGING");
System.out.println("merge: "+merge);
System.out.println("ragonvalia: "+ragonvalia);
System.out.println(merge.toString().equals(ragonvalia.toString()));//PRINT EQUALS
ragonvalia.setC01("PUEBLO LINDO");//I MODIFIY THE C01 FIELD
System.out.println(ragonvalia);
final Transaction tx = clazz.session.beginTransaction();
clazz.session.update(merge);//WE UPDATE
tx.commit();
//In this point i can see the c02 was reset again to 1953
clazz.shutDown();
}
Yep i know that merge is using for detached objects and all that stuff but what really are behind the select i just thought some things.
If when i retrieve the record from the first time thing the field c02=1953 latter was changed to c02=2000 i just though when the merge was made they would keep the new field already changed c02=2000 and if i do not modify the field in my session they would replace the c02 from 1953 which was the original to 2000 in the update to dont hurts anybody job when the keep the 1953 and updates the field as 1953 and 1953 replaces the 2000 in the database of course the job from the other person is lost.
I have read some stuff over the internet and i see something like this Essentially, if you do not have a version or timestamp field, Hibernate must check the existing record before updating it so that concurrent modifications do not occur. You would not want a record updated that someone else modified after you read it. There are a couple of solutions, outlined in the link above. But it makes life much easier if can add a version field on each table. Sounds great but before updating it so that concurrent modifications do not occur this is not happening Hibernate is just updating the fields i have in my class even when they are not the same in the currently DB record.
Hibernate must check the existing record before updating checking for what what hibernates checks?
In fact i am not using any version in my Models but seems the merge is only works to check that the records exists in the database.
I know this question is somehow simple or duplicate but i just cant see the logic or the benefits of firing a select.
Resume
After the merge Hibernate is updating all the properties even those whose unmodified i dont know why is this i just though that hibernate would update only the modified to gain performance, and the values of those properties are the same when the clazz was loaded for 1 time or modified by hand i think the merge was useless.
update
ragonvalia
set
.....
.....
.....
c01=?,
c02=?,HERE IS 1953 EVEN WHEN THE MERGE WAS FIRED THE VALUE IN THE DB WAS 2000
c03=?
where
ID=?
Your question mixes what appears to be 2 concerns.
Dynamic Update Statements
Hibernate has had support for #DynamicUpdate since 4.1.
For updating, should this entity use dynamic sql generation where only changed
columns get referenced in the prepared sql statement.
Note, for re-attachment of detached entities this is not possible without the
#SelectBeforeUpdate annotation being used.
This simply means that within the bounds of an open session, any entity attached to the session flagged with #DynamicUpdate will track field level changes and only issue DDL statements that include only the altered fields.
Should your entity be deattached from the session and you issue a merge, the #SelectBeforeUpdate annotation forces hibernate to refresh the entity from the database, attach it to the session and then determine dirty attributes in order to write the DDL statement with only altered fields.
It's worth pointing out that this will not guard you against concurrent updates to the same database records in a highly concurrent environment. This is simply a means to minimize the DDL statement for legacy or wide tables where a majority of the columns aren't changed.
Concurrent Changes
In order to deal with concurrent operations on the same data, you can approach this using two types of locking mechanics.
Pessimistic
In this situation, you would want to apply a lock at read time which basically forces the database to prevent any other transaction from reading/altering that row until the lock is released.
Since this type of locking can have severe performance implications, particularly on a highly concurrent table, it's generally not preferred. Most modern databases will reach a point of row level locks and eventually escalate them to the data page or worse the table; causing all other sessions to block until the locks are released.
public void alterEntityWithLockExample() {
// Read row, apply a row level update/write lock.
Entity entity = entityManager.find(Entity.class, 1L, LockModeType.PESSIMISTIC_WRITE);
// update entity and save it.
entity.setField(someValue);
entityManager.merge(entity);
}
It is probably worth noting that had any other session queried the entity with id 1 prior to the write lock being applied in the code above, the two sessions would still step on one another. All operations on Entity that would result in state changes would need to query using a lock in order to prevent concurrency issues.
Opimistic
This is a much more desired approach in a highly concurrent environment. This is where you'd want to annotate a new field with #Version on your entity.
This has the benefit that you can query an entity, leave it detached, reattach it later, perhaps in another request or thread, alter it, and merge the changes. If the record had changed since it was originally fetched at the start of your process, an OptimisticLockException will be thrown, allowing your business case to handle that scenario however you need.
In a web application as an example, you may want to inform the user to requery the page, make their changes, and resave the form to continue.
Pessimisitic locking is a proactive locking mechanic where-as optimistic is more reactionary.

Hibernate bulk update leads to in-query which takes for ever to complete

Article entity is a sub-class of the Product entity. The inheritance strategy for them is joined. Article#flag is a boolean attribute which I want to set false for all articles. Hence, I do
Query query = entityManager.createQuery("update Article set flag=:flagValue");
query.setParameter("flagValue", false);
query.executeUpdate();
I expected this to lead to a single SQL statement against the database which should complete fairly quickly. Instead Hibernate populates a temporary table (which does not physically exist in the database) and runs an in-query ie. the update later:
insert into HT_article select article0_.id as id from schema.article article0_ inner join schema.product article0_1_ on article0_.id=article0_1_.id
update schema.article set flag=0 where (id) IN (select id from HT_article)
The actual update statement takes "forever" to complete and locks the affected articles thereby causing lock exceptions in other transactions. By forever I mean more than an hour for 130000 articles.
What's the explanation for this behavior and how could I solve it? Other than running a native query I mean...
Update 2011-05-12: it's terribly slow because the in-query is slow, I filed a bug for that -> http://opensource.atlassian.com/projects/hibernate/browse/HHH-5905
I am using InheritanceType.JOINED of hibernate and faced a similar issue where hibernate was inserting into the temp table and was taking ages to delete the entities. I was using createQuery and executeUpdate to delete the records which caused the issue.
Started using session.delete(entity) which solved the issue for me.
Because you're using InheritanceType.JOINED, Hibernate has no choice but to unify the subclass and its base class.
If you switched to InheritanceType.TABLE_PER_CLASS (http://openjpa.apache.org/builds/1.0.2/apache-openjpa-1.0.2/docs/manual/jpa_overview_mapping_inher.html) you'd avoid this and get your performance back. But I'm sure you're using JOINED for a reason :-(

Insert a lot of data into database in very small inserts

So i have a database where there is a lot of data being inserted from a java application. Usualy i insert into table1 get the last id, then again insert into table2 and get the last id from there and finally insert into table3 and get that id as well and work with it within the application. And i insert around 1000-2000 rows of data every 10-15 minutes.
And using a lot of small inserts and selects on a production webserver is not really good, because it sometimes bogs down the server.
My question is: is there a way how to insert multiple data into table1, table2, table3 without using such a huge amount of selects and inserts? Is there a sql-fu technique i'm missing?
Since you're probably relying on auto_increment primary keys, you have to do the inserts one at a time, at least for table1 and table2. Because MySQL won't give you more than the very last key generated.
You should never have to select. You can get the last inserted id from the Statement using the getGeneratedKeys() method. See an example showing this in the MySQL manual for the Connector/J:
http://dev.mysql.com/doc/refman/5.1/en/connector-j-usagenotes-basic.html#connector-j-examples-autoincrement-getgeneratedkeys
Other recommendations:
Use multi-row INSERT syntax for table3.
Use ALTER TABLE DISABLE KEYS while you're importing, and re-enable them when you're finished.
Use explicit transactions. I.e. begin a transaction before your data-loading routine, and commit at the end. I'd probably also commit after every 1000 rows of table1.
Use prepared statements.
Unfortunately, you can't use the fastest method for bulk load of data, LOAD DATA INFILE, because that doesn't allow you to get the generated id values per row.
There's a lot to talk about here:
It's likely that network latency is killing you if each of those INSERTs is another network roundtrip. Try batching your requests so they only require a single roundtrip for the entire transaction.
Speaking of transactions, you don't mention them. If all three of those INSERTs need to be a single unit of work you'd better be handling transactions properly. If you don't know how, better research them.
Try caching requests if they're reused a lot. The fastest roundtrip is the one you don't make.
You could redesign your database such that the primary key was not a database-generated, auto-incremented value, but rather a client generated UUID. Then you could generated all the keys for every record upfront and batch the inserts however you like.

Categories