Java Hibernate generating 1-off queries taking minutes to complete - java

When running queries in hibernate it is loading related records with one-off queries.
Short version, can someone verify that this is a N+1 type issue?
And, if so provide a good resource on resolving them?
There are some queries that my application runs that return thousands of records. This is normal, however, (what i think is happening) hibernate is then loading related records using specific one-off queries.
In my case, i am querying the db about 6 times per record in the desired outter-most query. i.e. if there are 500 results in the original query, there are about 3,000 total queries being run.
What i think is happening:
Imagine i have a people table in the DB, i may also have an emails table, phone numbers table, and addresses table. I think that when i query the person table hibernate is fetching related records from phone numbers, emails ... In my case, looking at the generated HQL i can see that hibernate is running queries like this:
11:56:47,413 INFO [stdout] (default task-3) Hibernate: select identityen0_.id as id1_14_, identityen0_.auth_code as auth_cod2_14_, identityen0_.auth_provider_name as auth_pro3_14_, identityen0_.auth_provider_user_access_token as auth_pro4_14_, identityen0_.created_timestamp as created_5_14_, identityen0_.expiration as expirati6_14_, identityen0_.last_updated_timestamp as last_upd7_14_, identityen0_.person_id as person_10_14_, identityen0_.user_auth_provider_id as user_aut8_14_, identityen0_.username as username9_14_ from identities identityen0_ where identityen0_.auth_code=?
Notice that there are hundreds of these queries (one for each identity (person)).
I think this is because looking at the end of the query we can see where identityen0_.auth_code=? which implies that hibernate is doing a single query to get the identity info (one at a time) from a list of auth codes that it has.
This query takes minutes to complete and i am trying to speed that up. The obvious starting point would be to run fewer DB queries (avg latency of DB is 50-250 ms). I am wondering where to even start? Surely hibernate supports some kind of process to resolve this kind of issue, right?
Using hibernate-entitymanager 5.3.20.final
Thanks for any help.

Related

Spring JPA - Reading data along with relations - performance improvement

I am reading data from a table using Spring JPA.
This Entity object has one-to-many relationship to other six tables.
All tables together has 20,000 records in them.
I am using below query to fetch data from DB.
SELECT * FROM A WHER ID IN (SELECT ID FROM B WHERE COL1 = '?')
A table has relationship to other 6 tables.
Spring JPA is taking around 30 seconds of time to read this data from DB.
Any idea to improve the data fetch time here.
I am using native Queries here and i am looking for query rewriting that will optimize the data fetch time.
Please suggest thanks.
You might need consider below to identify the root cause:
Check if you are ending up with n+1 query issue. Your query might end up calling n queries for each join table, where n is no. of associations with the join table. You can check this by setting spring.jpa.show-sql=true
If you see the issue as n+1 then you need set appropriate FetchMode, refer https://www.baeldung.com/hibernate-fetchmode for detailed explanation of using different FetchModes.
If it is not n+1 query issue you might need to check the performance of the genarated queries using EXPLAIN command. Usually IN clause on a non indexed columns have performance impact.
So set spring.jpa.show-sql=true and check queries generated and run to debug and optimize your code or query.

Processing large amount of data from PostgreSQL

I am looking for a way how to process a large amount of data that are loaded from the database in a reasonable time.
The problem I am facing is that I have to read all the data from the database (currently around 30M of rows) and then process them in Java. The processing itself is not the problem but fetching the data from the database is. The fetching generally takes from 1-2 minutes. However, I need it to be much faster than that. I am loading the data from db straight to DTO using following query:
select id, id_post, id_comment, col_a, col_b from post_comment
Where id is primary key, id_post and id_comment are foreign keys to respective tables and col_a and col_b are columns of small int data types. The columns with foreign keys have indexes.
The tools I am using for the job currently are Java, Spring Boot, Hibernate and PostgreSQL.
So far the only options that came to my mind were
Ditch hibernate for this query and try to use plain jdbc connection hoping that it will be faster.
Completely rewrite the processing algorithm from Java to SQL procedure.
Did I miss something or these are my only options? I am open to any ideas.
Note that I only need to read the data, not change them in any way.
EDIT: The explain analyze of the used query
"Seq Scan on post_comment (cost=0.00..397818.16 rows=21809216 width=28) (actual time=0.044..6287.066 rows=21812469 loops=1), Planning Time: 0.124 ms, Execution Time: 8237.090 ms"
Do you need to process all rows at once, or can you process them one at a time?
If you can process them one at a time, you should try using a scrollable result set.
org.hibernate.Query query = ...;
query.setReadOnly(true);
ScrollableResults sr = query.scroll(ScrollMode.FORWARD_ONLY);
while(sr.next())
{
MyClass myObject = (MyClass)sr.get()[0];
... process row for myObject ...
}
This will still remember every object in the entity manager, and so will get progressively slower and slower. To avoid that issue, you might detach the object from the entity manager after you're done. This can only be done if the objects are not modified. If they are modified, the changes will NOT be persisted.
org.hibernate.Query query = ...;
query.setReadOnly(true);
ScrollableResults sr = query.scroll(ScrollMode.FORWARD_ONLY);
while(sr.next())
{
MyClass myObject = (MyClass)sr.get()[0];
... process row for myObject ...
entityManager.detach(myObject);
}
If I was in your shoes I would definitely bypass hibernate and go directly to JDBC for this query. Hibernate is not made for dealing with large result sets, and it represents an additional overhead for benefits that are not applicable to cases like this one.
When you use JDBC, do not forget to set autocommit to false and set some large fetch size (of the order of thousands) or else postgres will first fetch all 21 million rows into memory before starting to yield them to you. (See https://stackoverflow.com/a/10959288/773113)
Since you asked for ideas, I have seen this problem being resolved in below options depending on how it fits in your environment:
1) First try with JDBC and Java, simple code and you can do a test run on your database and data to see if this improvement is enough. You will here need to compromise on the other benefits of Hibernate.
2) In point 1, use Multi-threading with multiple connections pulling data to one queue and then you can use that queue to process further or print as you need. you may consider Kafka also.
3) If data is going to further keep on increasing you can consider Spark as the latest technology which can make it all in memory and will be much more faster.
These are some of the options, please like if these ideas help you anywhere.
Why do you 30M keep in memory ??
it's better to rewrite it to pure sql and use pagination based on id
you will be sent 5 as the id of the last comment and you will issue
select id, id_post, id_comment, col_a, col_b from post_comment where id > 5 limit 20
if you need to update the entire table then you need to put the task in the cron but also there to process it in parts
the memory of the road and downloading 30M is very expensive - you need to process parts 0-20 20-n n+20

HibernateTemplate is taking too much time to execute queries

I am using HibernateTemplate with Oracle Database and while executing simple queries it is taking too much time.
String queryString = "from document as doc where doc.name=?";
return getHibernateTemplate().find(queryString, "cloud");
This simple query which fetches 200 records taking 8-10 seconds.
One first step you can take to solve this issue is to gather more information by setting "hibernate.show_sql" to "true" in your configuration files in order to see exactly what SQL is generated. This will let you see and test the generated queries to isolate the source of the problem.
My best guess without more information is that this statement is triggering eager fetching for a large number of records. Overuse of eager fetching is a common mistake that can significantly slow down Hibernate applications. Hibernate's eager fetching can be very inefficient, retrieving records one at a time and running large numbers of queries against the database.

solr indexing performance with mysql driver

so, using solr 4.0
I have a fairly straight-up setup of an entity, with 1 sub entity (1:N relation)
the data to import sits on a mysql server
the main table has about 30 million records
the sub table has about 5 million records(most parent entities don't have the sub entity, the rest generally have a single 1)
I am running into rather horrible indexing(importing) performance. about 80 entities(docs) per second. so to index this table it'll in theory take few days.
now from what I am seeing that solr reports is, for example, if I tell it to index the first 1000 entities it actually issues 1000+ queries to sql. I have also tried setting the batchSize property for the data source with no luck... only -1 works(otherwise out of memory exception).
really not sure what I can do to optimize this, is there no PROPER data importer for mysql?
you could use CachedSqlEntityProcessor so that the sub entity query at least is cached...
Thought the cachedEntity approach helped me in another issue, I have found that using nested entities is usually not just the went to go.
The logic to fire the sub entity query for each "root" entity is just never going to work.
I've re-written my statements to SQL JOIN which fetches both root and sub entities as a single row and mapped to fields accordingly and performance improved significantly.

Hibernate running random queries

All, when I first call buildSessionFactory, Hibernate seems to be running a bunch of queries on my DB. They vary from being "selects" to "insert"s. Why is this and how can I stop it?
Edit: After some review, no the queries are not random. They seem to be inserts, selects and deletes into the tables on my DB. It almost looks like Hibernate is inserting a few records, running selects to make sure they were inserted and then deleting them.
When the session manager starts it generates and caches a standard set of CRUD queries for the mapped objects. What you're seeing is (probably) just the logging of this query generation activity.

Categories