How to update the first N rows with JPA and Hibernate - java

I want only update first N row, in SQL:
UPDATE Table1 SET c1 = 'XXX' WHERE Id IN (SELECT TOP 10 Id FROM Table1 ORDER BY c2)
Can Hibernate do that in ONE update?

With Hibernate, you can always issue a native query as such, but the current running Persistence Context won't be aware of the deleted entries.
As long as you only deleted a relatively small amount of entries, you can simply select N entities and then use the remove operation so that you can benefit from optimistic locking checks and prevent lost updates.
If you want to deletes lots of entries, then a bulk delete query is much more appropriate. You can even run the SQL DELETE query that you mentioned. That's exactly the reason why JPA and Hibernate allow you to use native SQL queries anyway.

Related

Spring JPA - Reading data along with relations - performance improvement

I am reading data from a table using Spring JPA.
This Entity object has one-to-many relationship to other six tables.
All tables together has 20,000 records in them.
I am using below query to fetch data from DB.
SELECT * FROM A WHER ID IN (SELECT ID FROM B WHERE COL1 = '?')
A table has relationship to other 6 tables.
Spring JPA is taking around 30 seconds of time to read this data from DB.
Any idea to improve the data fetch time here.
I am using native Queries here and i am looking for query rewriting that will optimize the data fetch time.
Please suggest thanks.
You might need consider below to identify the root cause:
Check if you are ending up with n+1 query issue. Your query might end up calling n queries for each join table, where n is no. of associations with the join table. You can check this by setting spring.jpa.show-sql=true
If you see the issue as n+1 then you need set appropriate FetchMode, refer https://www.baeldung.com/hibernate-fetchmode for detailed explanation of using different FetchModes.
If it is not n+1 query issue you might need to check the performance of the genarated queries using EXPLAIN command. Usually IN clause on a non indexed columns have performance impact.
So set spring.jpa.show-sql=true and check queries generated and run to debug and optimize your code or query.

Hibernate #DynamicUpdate

We have a table say 'A' in our system that has around 120 columns (I know that the table is not normalised and table normalisation exists as a backlog item in our roadmap).
Currently, we are facing a problem where multiple threads while updating same table row overwrite some of the existing updated columns. We do not have dynamic update annotation on our table entity.
I was thinking of adding that annotation on the table as that can solve our problem as different threads work on different columns of the table and if the sql is generated at runtime to only update those columns then it will not overwrite any update done on other columns by different thread.
But I read that hibernate caches insert, update sql for table with all columns in case dynamic update is not there.
So, will using #DynamicUpdate be actually beneficial here or the cost of generating dynamic sql at runtime would cause performance slow-down?
The table in question has millions of records and an insert or update happens every other second (updates being more frequent).
I will definitely recommend #DynamicUpdate to you. And let me explain you why:
a) DB Performance: By lowering the number of columns to update, you are cutting down DB server overhead to check referential integrity & constraints for each column(Primary Key, Foreign key, Unique key, Not Null etc), and to update respective indexes comprising the updated column.
b) You are only trading off with rebuilding cached Query plan everytime, but for multi-threaded environment remember it will not make the difference, as cached query plans are only valid till given session life and thus only useful for consecutive udpates in the same DB session.

Change of design of queries to improve performance

This is more like a design question but related to SQL optimization as well.
My project has to import a large number of records into the database (more than 100k records). In the meantime, the project has logic to check each record to make sure it meets the criteria which are configurable. It then will mark the record as no warning or has warning in the database. The inserting and warning checking are done within one importing process.
For each criteria it has to query the database. The query needs to join two other tables and sometimes add additional nested query inside the conditions, such as
select * from TableA a
join TableB on ...
join TableC on ...
where
(select count(*) from TableA
where TableA.Field = Bla) > 100
Although the queries take unnoticeable time, to query the entire record set takes a considerable amount of time which may be 4 - 5 hours on a server. Especially if there are many criteria, at the end the project will stop running the import and rollback.
I've tried changing "SELECT * FROM" to "SELECT TableA.ID FROM" but it seems it has no effect at all. Is there a better design to improve the performance of this process?
How about making a temp table (or more than one) that stores the aggregated results of the sub-queries, then indexing that/those with covering indexes.
From your code above, we'd make a temp table grouping on TableA.Field1 and including a count, then index on Field1, theCount. On SQL server the fastest approach would then be:
select * from TableA a
join TableB on ...
join TableC on ...
join (select Field1 from #temp1 where theCount > 100) t on...
The reason this works is that we are doing the same trick twice.
First, we pre-aggregate into the temp table, which is a simple operation and very easy for SQL Server to optimize. So we have taken a piece of the problem and solved in an optimizable way.
Then we repeat this trick by joining to a subquery, putting the filter inside the subquery, so that the join acts as a filter.
I would suggest you batch your records together (500 or so at a time) and send it to a stored proc which can do the calculation.
Use simple statements instead of joins in there. That saves as well. This link might help as well.
Good choice is using indexed view.
http://msdn.microsoft.com/en-us/library/dd171921(SQL.100).aspx

Hibernate bulk update leads to in-query which takes for ever to complete

Article entity is a sub-class of the Product entity. The inheritance strategy for them is joined. Article#flag is a boolean attribute which I want to set false for all articles. Hence, I do
Query query = entityManager.createQuery("update Article set flag=:flagValue");
query.setParameter("flagValue", false);
query.executeUpdate();
I expected this to lead to a single SQL statement against the database which should complete fairly quickly. Instead Hibernate populates a temporary table (which does not physically exist in the database) and runs an in-query ie. the update later:
insert into HT_article select article0_.id as id from schema.article article0_ inner join schema.product article0_1_ on article0_.id=article0_1_.id
update schema.article set flag=0 where (id) IN (select id from HT_article)
The actual update statement takes "forever" to complete and locks the affected articles thereby causing lock exceptions in other transactions. By forever I mean more than an hour for 130000 articles.
What's the explanation for this behavior and how could I solve it? Other than running a native query I mean...
Update 2011-05-12: it's terribly slow because the in-query is slow, I filed a bug for that -> http://opensource.atlassian.com/projects/hibernate/browse/HHH-5905
I am using InheritanceType.JOINED of hibernate and faced a similar issue where hibernate was inserting into the temp table and was taking ages to delete the entities. I was using createQuery and executeUpdate to delete the records which caused the issue.
Started using session.delete(entity) which solved the issue for me.
Because you're using InheritanceType.JOINED, Hibernate has no choice but to unify the subclass and its base class.
If you switched to InheritanceType.TABLE_PER_CLASS (http://openjpa.apache.org/builds/1.0.2/apache-openjpa-1.0.2/docs/manual/jpa_overview_mapping_inher.html) you'd avoid this and get your performance back. But I'm sure you're using JOINED for a reason :-(

Hibernate; HQL; why does the delete query not work, but select does?

I want to delete certain records from a table. These records have some child-records in other tables.
In order to be able to delete the main records, I have to delete the child records first.
Here is the example of the HQL used:
delete from ItineraryBooking ib where ib.booking.user.id = :paramId
Basically, this should remove all ItineraryBookings (records in seperate table), these are joined to the Booking table. A Booking table can be joined with the User table.
The odd thing is that when you change the above to:
from ItineraryBooking ib where ib.booking.user.id = :paramId
And execute a Query.list(), it will work just fine.
Whenever I want to execute the delete variant, it looks like Hibernate generates an odd delete statement. Is my HQL wrong? Or is it a Hibernate quirk?
From the hibernate manual:
No joins, either implicit or explicit,
can be specified in a bulk HQL query.
Sub-queries can be used in the
where-clause, where the subqueries
themselves may contain joins.
Your ib.booking.user.id clause looks like a join to me. I don't know if Hibernate actively rejects joins in a delete statement, or just silently gets it wrong.
A nicer way to delete child records is to use cascading deletes.
Simple questions that may help:
Just for curiosity, are you running this HQL in a transaction? selects doesn't need a transaction, but deletes does need it.
Are you flushing the session after executing the deletion HQL?
Are you deleting, and selecting in the same transaction or in separate ones?

Categories