Performance issues when calling MySQL stored procedure using Hibernate - java

I'm trying to understand why the execution time of my stored procedure is so much higher when I run it from Java using Hibernate than when I run it directly in MySQL.
The stored procedure itself is responsible for moving 20000 rows from table A to table B and then delete them in table A.
Running the stored procedure in MySQL takes around 18 seconds.
In Java, I'm using Hibernate and create a query:
Query query =
mainSession
.createSQLQuery("{CALL my_stored_procedure(:maxResultSize)}")
.setParameter("maxResultSize", maxResultSize);
Then the query is executed and the session is flushed and cleared:
List<BigInteger> rows = query.list();
mainSession.flush();
mainSession.clear();
This takes around 248 seconds.
Does anyone know why it takes so much more time to call the stored procedure from Java using Hibernate?
What approach should I take to increase the performance?

Please could you try with native query, it is faster for me and work well.
List<Object[]> query = (List<Object[]>) mySessionFactory.getCurrentSession()
.createNativeQuery("{CALL my_stored_procedure(:maxResultSize)}")
.setParameter("maxResultSize", maxResultSize).getResultList();

Related

How to get data from Oracle table into java application concurrently

I have an Oracle table with ~10 million records that are not dependent on each other . An existing Java application executes the query an iterates through the returned Iterator batching the records for further processing. The fetchSize is set to 250.
Is there any way to parallelize getting the data from the Oracle DB? One thing that comes to mind is to break down the query into chunks using "rowid" and then pass these chunks to separate threads.
I am wondering if there is some kind of standard approach in solving this issue.
Few approaches to achieve it:
alter session force parallel QUERY parallel 32; execute this at DB level in PL/SQL code just before the execution of SELECT statement. You can adjust the 32 value depends on number of Nodes (RAC setup).
The approach which you are doing on the basis of ROWID but the difficult part is how you return the chunk of SELECT queries to JAVA and how you can combine that result. So this approach is bit difficult.

Processing large amount of data from PostgreSQL

I am looking for a way how to process a large amount of data that are loaded from the database in a reasonable time.
The problem I am facing is that I have to read all the data from the database (currently around 30M of rows) and then process them in Java. The processing itself is not the problem but fetching the data from the database is. The fetching generally takes from 1-2 minutes. However, I need it to be much faster than that. I am loading the data from db straight to DTO using following query:
select id, id_post, id_comment, col_a, col_b from post_comment
Where id is primary key, id_post and id_comment are foreign keys to respective tables and col_a and col_b are columns of small int data types. The columns with foreign keys have indexes.
The tools I am using for the job currently are Java, Spring Boot, Hibernate and PostgreSQL.
So far the only options that came to my mind were
Ditch hibernate for this query and try to use plain jdbc connection hoping that it will be faster.
Completely rewrite the processing algorithm from Java to SQL procedure.
Did I miss something or these are my only options? I am open to any ideas.
Note that I only need to read the data, not change them in any way.
EDIT: The explain analyze of the used query
"Seq Scan on post_comment (cost=0.00..397818.16 rows=21809216 width=28) (actual time=0.044..6287.066 rows=21812469 loops=1), Planning Time: 0.124 ms, Execution Time: 8237.090 ms"
Do you need to process all rows at once, or can you process them one at a time?
If you can process them one at a time, you should try using a scrollable result set.
org.hibernate.Query query = ...;
query.setReadOnly(true);
ScrollableResults sr = query.scroll(ScrollMode.FORWARD_ONLY);
while(sr.next())
{
MyClass myObject = (MyClass)sr.get()[0];
... process row for myObject ...
}
This will still remember every object in the entity manager, and so will get progressively slower and slower. To avoid that issue, you might detach the object from the entity manager after you're done. This can only be done if the objects are not modified. If they are modified, the changes will NOT be persisted.
org.hibernate.Query query = ...;
query.setReadOnly(true);
ScrollableResults sr = query.scroll(ScrollMode.FORWARD_ONLY);
while(sr.next())
{
MyClass myObject = (MyClass)sr.get()[0];
... process row for myObject ...
entityManager.detach(myObject);
}
If I was in your shoes I would definitely bypass hibernate and go directly to JDBC for this query. Hibernate is not made for dealing with large result sets, and it represents an additional overhead for benefits that are not applicable to cases like this one.
When you use JDBC, do not forget to set autocommit to false and set some large fetch size (of the order of thousands) or else postgres will first fetch all 21 million rows into memory before starting to yield them to you. (See https://stackoverflow.com/a/10959288/773113)
Since you asked for ideas, I have seen this problem being resolved in below options depending on how it fits in your environment:
1) First try with JDBC and Java, simple code and you can do a test run on your database and data to see if this improvement is enough. You will here need to compromise on the other benefits of Hibernate.
2) In point 1, use Multi-threading with multiple connections pulling data to one queue and then you can use that queue to process further or print as you need. you may consider Kafka also.
3) If data is going to further keep on increasing you can consider Spark as the latest technology which can make it all in memory and will be much more faster.
These are some of the options, please like if these ideas help you anywhere.
Why do you 30M keep in memory ??
it's better to rewrite it to pure sql and use pagination based on id
you will be sent 5 as the id of the last comment and you will issue
select id, id_post, id_comment, col_a, col_b from post_comment where id > 5 limit 20
if you need to update the entire table then you need to put the task in the cron but also there to process it in parts
the memory of the road and downloading 30M is very expensive - you need to process parts 0-20 20-n n+20

How to bulk insert in db if another insert succeeds?

I have to insert ~40K records in 2 tables(say table1 & table2) in the database.
The insert in table2 is conditional. A record should be inserted in table2 if and only if a record is inserted in table1 successfully.
Can this be done in batch? I'm using JDBC driver. I'm using Oracle 10g XE.
What is the best approach to do this? Should I go for db pooling with multi-threading?
The executeUpdate method will return the number of rows affected by your statement. Could use this as a comparison to check it had executed successfully.
My suggestion is perform the business logic for the operation as close to the data as possible. This will mean having a PL/SQL procedure to act as an API for the functionality you wish to perform.
This will make your code trivial; a simple call to the database procedure which will return something giving you the result.
All the logic applied to the data is performed by code designed almost exclusively to manipulate data. Unlike Java which can manipulate data but not as well as PL/SQL. Incidentally it is also likely to be much faster.(this presentation on Youtube is very informative, if a little long - https://www.youtube.com/watch?v=8jiJDflpw4Y )

Optimizing Hibernate session.createQuery().list();

We have a Users table (MySQL) with 120,000 rows
List<User> users = session.createQuery("from User").list();
This hibernate query takes about 6 to 9 seconds to execute. How can we optimize this? Is MySQL the bottleneck? Or is .list() usually this slow?
Ofcourse it's slow because the query perform the full table scan. You should join other objects associated with it, including where clause of the query, the query could be changed to return the limited number of records or use criteria API and projection.
Use pagination on your query. You should not call all rows at a time. You can set first position of result and maximum result. For example, if you want to read first 100 result than change your query like:
Query q=session.createQuery("from User");
q.setFirstResult(fistRes);// this will variable
q.setMaxResults(maxRes);// this will also be variable as parameter.

Hibernate limit result inquiry

How does maxresult property of hibernate query works? in the example below :
Query query = session.createQuery("from MyTable");
query.setMaxResults(10);
Does this get all rows from database, but only 10 of them are displayed? or this is same as limit in sql.
It's the same as LIMIT, but it is database-independent. For example MS SQL Server does not have LIMIT, so hibernate takes care of translating this. For MySQL it appends LIMIT 10 to the query.
So, always use query.setMaxResults(..) and query.setFirstResult(..) instead of native sql clauses.
SetMaxResults retrieves all the rows and displays the number of rows that are set. In my case I didn't manage to find a method that can really retrieve only a limit number of rows.
My recommendation is to create the query yourself and set there "rownum" or "limit".

Categories