I'm trying to cut down the number of n+1 selects incurred by my application, the application uses EclipseLink as an ORM and in as many places as possible I've tried to add the batch read hint to queries. In a large number of places in the app I don't always know exactly what relationships I'll be traversing (My view displays fields based on user preferences). At that point I'd like to run one query to populate all of those relationships for my objects.
My dream is to call something like ReadAllRelationshipsQuery(Collection,RelationshipName) and populate all of these items so that later calls to:
Collection.get(0).getMyStuff will already be populated and not cause a db query. How can I accomplish this? I'm willing to write any code I need to but I can't find a way that work with the eclipselink framework?
Why don't I just batch read all of the possible fields and let them load lazily? What I've found is that the batch value holders that implement batch reads don't behave well with the eclipselink cache. If a batch read value holder isn't "evaluated" and ends up in the eclipse link cache it can become stale and return incorrect data (This behavior was logged as an eclipselink bug but rejected...)
edit: I found the link to the bug here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=326197
How do I avoid N+1 selects for objects I already have a reference to?
You have three basic ways to load data into objects from a JPA-based solution. These are:
Load dynamically by object traversal (e.g. myObject.getMyCollection().get()).
Load graphs of objects by prefetching dynamically using JPA QL (e.g. FETCH JOINs as described at the Oracle JPA tutorial )
Load by setting the fetch mode ( Is there a way to change the JPA fetch type on a method? )
Each of these has pros and cons.
Loading dynamically by object transversal will generate more (highly targeted queries). These queries are usually small (not large SQL statements, but may load lots of data) and tend to play nicely with a second level cache, but you can get lots and lots of little queries.
Prefetching with JPA QL will give you exactly what you want, but that assumes that you know what you want.
Setting the fetch mode to EAGER will load lots and lots of data for you automatically, but depending on the configuration and usage this may not actually help much (or may make things a lot worse) as you may wind up dragging a LOT of data from the DB into your app that you didn't expect.
Regardless, I highly recommend using p6spy ( http://sourceforge.net/projects/p6spy/ ) in conjunction with any JPA-based application to understand the effects of your tuning.
Unfortunately, JPA makes some things easy and some things hard - mainly, side-effects of your usage. For example, you might fix one problem by setting the fetch mode to eager, and then create another problem where the eager fetch pulls in too much data. EclipseLink does provide tooling to help sort this out ( EclipseLink Performance Tools )
In theory, if you wanted to you could write a generic JavaBean property walker by using something like Apache BeanUtils. Usually just calling a method like size() on a collection is enough to force it to load (although using a collection batch fetch size might complicate things a bit).
One thing to pay particular attention to is the scope of your session and your use of caches (EclipseLink cache).
Something not clear from your post is the scope of a session. Is a session a one shot affair (e.g. like a web page request) or is it a long running thing (e.g. like a classic client/server GUI app)?
It is very difficult to optimize the retrieval of relationships if you do not know what relationships you require.
If you application is requesting what relationships it wants, then you must know at some level which relationships you require, and should be able to optimize these in your query for the objects.
For an overview of relationship optimization techniques see,
http://java-persistence-performance.blogspot.com/2010/08/batch-fetching-optimizing-object-graph.html
For Batch Fetching, there are three types, JOIN, EXISTS, and IN. The problem you outlined of changes to data affecting the original query for cache batched relationships only applies to JOIN and EXISTS, and only when you have a selection criteria based on updateale fields, (if the query you are optimizing is on id, or all instances you are ok). IN batch fetching does not have this issue, so you can use IN batch fetching for all the relationships and not have this issue.
ReadAllRelationshipsQuery(Collection,RelationshipName)
How about,
Query query = em.createQuery("Select o from MyObject o where o.id in :ids");
query.setParameter(ids, ids);
query.setHint("eclipselink.batch", relationship);
If you know all possible relations and the user preferences, why don't you just dynamically build the JPQL string (or Criteria) before executing it?
Like:
String sql = "SELECT u FROM User u"; //use a StringBuilder, this is just for simplity's sake
if(loadAdress)
{
sql += " LEFT OUTER JOIN u.address as a"; //fetch join and left outer join have the same result in many cases, except that with left outer join you could load associations of address as well
}
...
Edit: Since the result would be a cross product, you should then iterate over the entities and remove duplicates.
In the query, use FETCH JOIN to prefetch relationships.
Keep in mind that the resulting rows will be the cross product of all rows selected, which can easily be more work than the N+1 queries.
Related
For my website, I'm creating a book database. I have a catalog, with a root node, each node have subnodes, each subnode has documents, each document has versions, and each version is made of several paragraphs.
In order to create this database the fastest possible, I'm first creating the entire tree model, in memory, and then I call session.save(rootNode)
This single save will populate my entire database (at the end when I'm doing a mysqldump on the database it weights 1Go)
The save coasts a lot (more than an hour), and since the database grows with new books and new versions of existing books, it coasts more and more. I would like to optimize this save.
I've tried to increase the batch_size. But it changes nothing since it's a unique save. When I mysqldump a script, and I insert it back into mysql, the operation coast 2 minutes or less.
And when I'm doing a "htop" on the ubuntu machine, I can see the mysql is only using 2 or 3 % CPU. Which means that it's hibernate who's slow.
If someone could give me possible techniques that I could try, or possible leads, it would be great... I already know some of the reasons, why it takes time. If someone wants to discuss it with me, thanks for his help.
Here are some of my problems (I think): For exemple, I have self assigned ids for most of my entities. Because of that, hibernate is checking each time if the line exists before it saves it. I don't need this because, the batch I'm executing, is executed only one, when I create the databse from scratch. The best would be to tell hibernate to ignore the primaryKey rules (like mysqldump does) and reenabeling the key checking once the database has been created. It's just a one shot batch, to initialize my database.
Second problem would be again about the foreign keys. Hibernate inserts lines with null values, then, makes an update in order to make foreign keys work.
About using another technology : I would like to make this batch work with hibernate because after, all my website is working very well with hibernate, and if it's hibernate who creates the databse, I'm sure the naming rules, and every foreign keys will be well created.
Finally, it's a readonly database. (I have a user database, which is using innodb, where I do updates, and insert while my website is running, but the document database is readonly and mYisam)
Here is a exemple of what I'm doing
TreeNode rootNode = new TreeNode();
recursiveLoadSubNodes(rootNode); // This method creates my big tree, in memory only.
hibernateSession.beginTrasaction();
hibernateSession.save(rootNode); // during more than an hour, it saves 1Go of datas : hundreads of sub treeNodes, thousands of documents, tens of thousands paragraphs.
hibernateSession.getTransaction().commit();
It's a little hard to guess what could be the problem here but I could think of 3 things:
Increasing batch_size only might not help because - depending on your model - inserts might be interleaved (i.e. A B A B ...). You can allow Hibernate to reorder inserts and updates so that they can be batched (i.e. A A ... B B ...).Depending on your model this might not work because the inserts might not be batchable. The necessary properties would be hibernate.order_inserts and hibernate.order_updates and a blog post that describes the situation can be found here: https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/
If the entities don't already exist (which seems to be the case) then the problem might be the first level cache. This cache will cause Hibernate to get slower and slower because each time it wants to flush changes it will check all entries in the cache by iterating over them and calling equals() (or something similar). As you can see that will take longer with each new entity that's created.To Fix that you could either try to disable the first level cache (I'd have to look up whether that's possible for write operations and how this is done - or you do that :) ) or try to keep the cache small, e.g. by inserting the books yourself and evicting each book from the first level cache after the insert (you could also go deeper and do that on the document or paragraph level).
It might not actually be Hibernate (or at least not alone) but your DB as well. Note that restoring dumps often removes/disables constraint checks and indices along with other optimizations so comparing that with Hibernate isn't that useful. What you'd need to do is create a bunch of insert statements and then just execute those - ideally via a JDBC batch - on an empty database but with all constraints and indices enabled. That would provide a more accurate benchmark.
Assuming that comparison shows that the plain SQL insert isn't that much faster then you could decide to either keep what you have so far or refactor your batch insert to temporarily disable (or remove and re-create) constraints and indices.
Alternatively you could try not to use Hibernate at all or change your model - if that's possible given your requirements which I don't know. That means you could try to generate and execute the SQL queries yourself, use a NoSQL database or NoSQL storage in a SQL database that supports it - like Postgres.
We're doing something similar, i.e. we have Hibernate entities that contain some complex data which is stored in a JSONB column. Hibernate can read and write that column via a custom usertype but it can't filter (Postgres would support that but we didn't manage to enable the necessary syntax in Hibernate).
I have problems with full loading of very complex object from DB in a reasonable time and with reasonable count of queries.
My object has a lot of embedded entities, each entity has references to another entities, another entities references yet another and so on (So, the nesting level is 6)
So, I've created example to demonstrate what I want:
https://github.com/gladorange/hibernate-lazy-loading
I have User.
User has #OneToMany collections of favorite Oranges,Apples,Grapevines and Peaches. Each Grapevine has #OneToMany collection of Grapes. Each fruit is another entity with just one String field.
I'm creating user with 30 favorite fruits of each type and each grapevine has 10 grapes. So, totally I have 421 entity in DB - 30*4 fruits, 100*30 grapes and one user.
And what I want: I want to load them using no more than 6 SQL queries.
And each query shouldn't produce big result set (big is a result set with more that 200 records for that example).
My ideal solution will be the following:
6 requests. First request returns information about user and size of result set is 1.
Second request return information about Apples for this user and size of result set is 30.
Third, Fourth and Fifth requests returns the same, as second (with result set size = 30) but for Grapevines, Oranges and Peaches.
Sixth request returns Grape for ALL grapevines
This is very simple in SQL world, but I can't achieve such with JPA (Hibernate).
I tried following approaches:
Use fetch join, like from User u join fetch u.oranges .... This is awful. The result set is 30*30*30*30 and execution time is 10 seconds. Number of requests = 3. I tried it without grapes, with grapes you will get x10 size of result set.
Just use lazy loading. This is the best result in this example (with #Fetch=
SUBSELECT for grapes). But in that case that I need to manually iterate over each collection of elements. Also, subselect fetch is too global setting, so I would like to have something which could work on query level. Result set and time near ideal. 6 queries and 43 ms.
Loading with entity graph. The same as fetch join but it also make request for every grape to get it grapevine. However, result time is better (6 seconds), but still awful. Number of requests > 30.
I tried to cheat JPA with "manual" loading of entities in separate query. Like:
SELECT u FROM User where id=1;
SELECT a FROM Apple where a.user_id=1;
This is a little bit worse that lazy loading, since it requires two queries for each collection: first query to manual loading of entities (I have full control over this query, including loading associated entities), second query to lazy-load the same entities by Hibernate itself (This is executed automatically by Hibernate)
Execution time is 52, number of queries = 10 (1 for user, 1 for grape, 4*2 for each fruit collection)
Actually, "manual" solution in combination with SUBSELECT fetch allows me to use "simple" fetch joins to load necessary entities in one query (like #OneToOne entities) So I'm going to use it. But I don't like that I have to perform two queries to load collection.
Any suggestions?
I usually cover 99% of such use cases by using batch fetching for both entities and collections. If you process the fetched entities in the same transaction/session in which you read them, then there is nothing additionally that you need to do, just navigate to the associations needed by the processing logic and the generated queries will be very optimal. If you want to return the fetched entities as detached, then you initialize the associations manually:
User user = entityManager.find(User.class, userId);
Hibernate.initialize(user.getOranges());
Hibernate.initialize(user.getApples());
Hibernate.initialize(user.getGrapevines());
Hibernate.initialize(user.getPeaches());
user.getGrapevines().forEach(grapevine -> Hibernate.initialize(grapevine.getGrapes()));
Note that the last command will not actually execute a query for each grapevine, as multiple grapes collections (up to the specified #BatchSize) are initialized when you initialize the first one. You simply iterate all of them to make sure all are initialized.
This technique resembles your manual approach but is more efficient (queries are not repeated for each collection), and is more readable and maintainable in my opinion (you just call Hibernate.initialize instead of manually writing the same query that Hibernate generates automatically).
I'm going to suggest yet another option on how to lazily fetch collections of Grapes in Grapevine:
#OneToMany
#BatchSize(size = 30)
private List<Grape> grapes = new ArrayList<>();
Instead of doing a sub-select this one would use in (?, ?, etc) to fetch many collections of Grapes at once. Instead ? Grapevine IDs will be passed. This is opposed to querying 1 List<Grape> collection at a time.
That's just yet another technique to your arsenal.
I do not quite understand your demands here. It seems to me you want Hibernate to do something that it's not designed to do, and when it can't, you want a hack-solution that is far from optimal. Why not loosen the restrictions and get something that works? Why do you even have these restrictions in the first place?
Some general pointers:
When using Hibernate/JPA, you do not control the queries. You are not supposed to either (with a few exceptions). How many queries, the order they are executed in, etc, is pretty much beyond your control. If you want complete control of your queries, just skip JPA and use JDBC instead (Spring JDBC for instance.)
Understanding lazy-loading is key to making decisions in these type of situation. Lazy-loaded relations are not fetched when getting the owning entity, instead Hibernate goes back to the database and gets them when they are actually used. Which means that lazy-loading pays off if you don't use the attribute every time, but has a penalty the times you actually use it. (Fetch join is used for eager-fetching a lazy relation. Not really meant for use with regular load from the database.)
Query optimalization using Hibernate should not be your first line of action. Always start with your database. Is it modelled correctly, with primary keys and foreign keys, normal forms, etc? Do you have search indexes on proper places (typically on foreign keys)?
Testing for performance on a very limited dataset probably won't give the best results. There probably will be overhead with connections, etc, that will be larger than the time spent actually running the queries. Also, there might be random hickups that cost a few milliseconds, which will give a result that might be misleading.
Small tip from looking at your code: Never provide setters for collections in entities. If actually invoked within a transaction, Hibernate will throw an exception.
tryManualLoading probably does more than you think. First, it fetches the user (with lazy loading), then it fetches each of the fruits, then it fetches the fruits again through lazy-loading. (Unless Hibernate understands that the queries will be the same as when lazy loading.)
You don't actually have to loop through the entire collection in order to initiate lazy-loading. You can do this user.getOranges().size(), or Hibernate.initialize(user.getOranges()). For the grapevine you would have to iterate to initialize all the grapes though.
With proper database design, and lazy-loading in the correct places, there shouldn't be a need for anything other than:
em.find(User.class, userId);
And then maybe a join fetch query if a lazy load takes a lot of time.
In my experience, the most important factor for speeding up Hibernate is search indexes in the database.
I'm writing this on the fly on my phone, so forgive the crappy code samples.
I have entities with a manytomany relationship:
#JoinTable(name="foo", #JoinColum="...", #InverseJoinColumn="...")
#ManyToMany
List list = new ArrayList();
I want their data to be retrieved in a paginated way.
I know about setFirstResult and setMaxResults. Is there a way to use this with the mapping? As in, I retrieve the object and get the list filled with contents equal to the amount of records for a single page, with the appropriate offset.
I guess I'm just unclear of the best way to do this. I could just manually use hibernate criteria to have the effect, but I feel thats missing the API. I have this mapping, I want to see if there's a way to use it in a paginated way.
PS. If this is impractical, just say. Also, if it is, can I still use the mapping to add new entries to the join table. As in, if the entity is a persisted entity in the DB, but I haven't fetched the manytomany list, can I add something new to it and when its persisted with cascade all it'll be added to the join table without clearing the other entries?
The type of the relationship between entities that are part of your query isn't that important. There are a couple of ways to tackle this.
If your database supports the LIMIT keyword in it's queries, you would be able to use it to get data sets, assuming you sort your data. Note that if your data changes while your user is navigating between pages, you might see some duplication or miss some records. You'll be stuck having to rewrite if your database changes to one that doesn't have the LIMIT keyword.
If you need to freeze the data at the point of the original query you need to use a 3rd party framework or write your own to fetch a list of Ids for your query then split up that list and fetch by id in a subset for pagination. This is more reliable can be made to work for any database.
Displaytag is a data paging framework I've used and that I therefore can tell you works well for large datasets. It's also one of the older solutions for this problem and is not part of an extended framework.
http://displaytag.sourceforge.net/11/tut_externalSortAndPage.html
Table sorter is another one I came across. This one uses JQuery and fetches the entire data set in one query, so strictly speaking it doesn't meet your "fetches the data in a paginated way" criteria. (This might not be appropriate for large sets).
http://tablesorter.com/docs/
This tutorial might be helpful:
http://theopentutorials.com/examples/java-ee/jsp/pagination-in-servlet-and-jsp/
If you're already using a framework take a look at whether that framework has tackled pagination:
Spring MVC provides a data pager
http://blog.fawnanddoug.com/2012/05/pagination-with-spring-mvc-spring-data.html
GWT provides a data pager:
http://www.gwtproject.org/javadoc/latest/com/google/gwt/user/cellview/client/SimplePager.html
The following refrences might be helpful too:
JDBC Pagination
which also points to:
http://java.avdiel.com/Tutorials/JDBCPaging.html
I have a query like this
Select Folder.name from FROM FolderTable,ValidFolder, ValidFolderGroup, ValidUser,
ValidLocation, ValidDepartment where ValidUser.LocationCode *= ValidLocation.LocationCode
and ValidUser.DepartmentCode *= ValidDepartment.DepartmentCode and Folder.IssueUser =
ValidUser.UserId and ValidFolder.FolderType = Folder.FolderType and
ValidFolderGroup.FolderGroupCode = ValidFolder.FolderGroupCode and
ValidFolderGroup.GroupTypeCode = 13 and (ValidUser.UserId='User' OR
ValidUser.ManagerId='User') and ValidFolderGroup.GroupTypeCode = 13 and
Folder.IssueUser = 'User'
Now here all the table which start with Valid are cache table so these table already contains data .
Suppose if someone using JOOQ or Hibernate which one will be the best option
Use query as written above with all Joins?
Or Use Java code to fulfill the requirement rather than join because as user using Hibernate or JOOQ it already have Java class for the table and Valid table have already all the data ?
Okay, you're probably not going to like this answer, but the best way to do this is not to keep Valid "cached".
The best solution in my opinion would be to use jOOQ (if you prefer DSL) or Hibernate (if you prefer OR mapping) and query the Database every time, and consistently use the DAO pattern.
The jOOQ and Hibernate guys are almost certainly better at SQL than you are. We've used jOOQ and Hibernate in really large enterprise projects, and they both perform exceptionally. Particularly with a good connection pool like BoneCP. If after you've got that setup running, and running well, but still think you may have performance issues, you can always add a cache (like EhCache) afterwards.
Ultimately tho', I'm making a lot of assumptions about your software, namely that
There are more people than you working on it, and
It has to be maintained. If neither of these assumptions are true, then you can safely disregard this answer.
General answer:
Modern databases are incredibly good at optimising your query and choosing the best possible execution plan for you. Given your outer join notation using *=, you're obviously using SQL Server, so that's a pretty good database.
Even if you already have much of the "Valid" data in your application memory, chances are that your database also already has the same data in a buffer cache and thus the database doesn't need to hit the disk again for the various joins in your query.
In fact, depending on the nature of your data, the database might even assess that some of your joins are unneeded (if you have the right meta data, like constraints).
Specific answer:
In your particular case, it looks as though you can indeed strip most of your query yourself and query only the Folder table using search criteria from your application's "Valid" cache. I'm saying that it looks like it, because I don't fully understand the business logic behind those joins and whether they're all modelling 1:1 relationships, or whether removing them will change the semantics of the query.
So, technically, it's possible that you can remove the joins, but if you want to stay on the safe side, just keep things as they are as you migrate to jOOQ or Hibernate.
Alternative 3:
Of course, instead of tampering with this query, you might even be able to remove this query and fetch the Folder.name property already in your previous queries when you load the "Valid" content into memory.
Ever heard of views? Look into them, you'll be amazed.
Apart from that, it's impossible to say what you should do, there's no "best" and you provide way too little information to even make an educated guess about your specific requirements.
But, I'd not hard code things like database IDs in a query that ends up inside any program, far too prone to cause problems in the (near) future.
I'm working on a web based application that belongs to an automobil manufacturer, developed in Spring-Hibernate with MS SQL Server 2005 database.
There are three kind of use cases:
1) Through this application, end users can request for creating a Car, Bus, Truck etc through web based interfaces. When a user logs in, a HTML form gets displayed for capturing technical specification of vehicle, for ex, if someone wanted to request for Car, he can speify the Engine Make/Model, Tire, Chassis details etc and submit the form. I'm using Hibernate here for persistence, i.e. I've a Car Entity that gets saved in DB for each such request.
2) This part of the application deals with generation of reports. These reports mainly dela with number of requests received in a day and the summary. Some of the reports calculate Turnaround time for individual Create vehicle requests.
I'm using plain JDBC calls with Preparedstatement (if report can be generated with SQLs), Callablestatement (if report is complex enough and needs a DB procedure/Function to fetch all details) and HibernateCallback to execute the SQLs/Procedures and display information on screen.
3) Search: This part of application allows ensd users to search for various requests data, i.e. how many vehicle have been requested in a Year etc. I'm using DB procedure with CallableStatement..Once again executing these procedures within HibernateCallback, populating and returning search result on GUI in a POJO.
I'm using native SQL in (2) and (3) above, because for the reporting/search purpose the report data structure to display on screen is not matching with any of my Entity. For ex: Car entity has got more than 100 attributes in itself, but for reporting purpose I don't need more than 10 of them.. so i just though loading all 100 attributes does not make any sense, so why not use plain SQL and retrieve just the data needed for displaying on screen.
Similarly for Search, I had to write procedures/Functions because search algorithm is not straight forward and Hibernate has no way to write a stored procedure kind of thing.
This is working fine for proto type, however I would like to know
a. If my approach for using native SQLs and DB procedures are fine for case 2 and 3 based on my judgement.
b. Also whether executing SQLs in HibernateCallback is correct approach?
Need expert's help.
I would like to know (...) if my approach for using native SQLs and DB procedures are fine for case 2 and 3 based on my judgment
Nothing forces your to use a stored procedure for case 2, you could use HQL and projections as already pointed out:
select f.id, f.firstName from Foo f where ...
Which would return an Object[] or a List<Object[]> depending on the where condition.
And if you want type safe results, you could use a SELECT NEW expression (assuming you're providing the appropriate constructor):
select new Foo(f.id, f.firstName) from Foo f
And you can even return non entities
select new com.acme.LigthFoo(f.id, f.firstName) from Foo f
For case 3, the situation seems different. Just in case, note that the Criteria API is more appropriate than HQL to build dynamic queries. But it looks like this won't help here.
I would like to know (...) whether executing SQLs in HibernateCallback is correct approach?
First of all, there are several restrictions when using stored procedures and I prefer to avoid them when possible. Secondly, if you want to return entities, it isn't the only way and simplest solution as we saw. So for case 2, I would consider using HQL.
For case 3, since you aren't returning entities at all, I would consider not using Hibernate API but the JDBC support from Spring which offers IMHO a cleaner API than Session#connection() and the HibernateCallback.
More interesting readings:
References
Hibernate Core reference guide
14.6. The select clause (about the select new)
16.1.5. Returning non-managed entities (about ResultTransformer)
16.2.2. Using stored procedures for querying
Resources
Hibernate 3.2: Transformers for HQL and SQL
Related questions
hibernate SQLquery extract variable
hibernate query language or using criteria
You should strive to use as much HQL as possible, unless you have a good argument (like performance, but do a benchmark first). If the use of native queries becomes to excessive, you should consider whether Hibernate has been a good choice.
Note a few things:
you can have native queries and stored procedures that result in Hibernate entities. You just have to map the query / storproc call to a class and call it by session.createSQLQuery(queryName)
If you really need to construct native queries at runtime, the newest version of hibernate have a doWork(..) method, by which you can do JDBC work.
You say
For ex: Car entity has got more than 100 attributes in itself, but for reporting purpose I don't need more than 10 of them.. so i just though loading all 100 attributes does not make any sense
but HQL in hibernate allows you to do a projection (select only a subset of the columns back). You don't have to pull the entire entity if you don't want to.
Then you get all the benefits of HQL (typing of results, HQL join syntax) but you can pretty much write SQLish code.
See here for the HQL docs and here for the select syntax. If you're used to SQL it's pretty easy.
So to answer you directly
a - No, I think you should be using HQL
b - Becomes irrelevant if you go with my suggestion for a.