Figure out what Tables a JOOQ Function depends on

Figure out what Tables a JOOQ Function depends on - java

I was trying to dynamically add some joins to my JOOQ query only when the select statement included fields that required those tables to be part of the query. So I need to know what tables a given Field depends on.
This is easy for TableField's via TableField.getTable().
It looked easy for Function's too - with Function.getArguments() I could recursively walk the function tree until I reached all the TableField leaf nodes.
Sadly for me, however, Function class is package private.
I also didn't see any static utilities in JOOQ that could figure this out for me.
Wonder if there's a way to do this short of dirty regex against generated sql, or sneaky reflection.

The "official" way to do this kind of SQL transformation would be by using the VisitListener SPI and to hook into the query rendering lifecycle, collecting all distinct tables from the SELECT clause, adding them to the FROM clause via the most appropriate join.
This is the only way to get access to the arguments of the internal Function type, short of hacking stuff together via regexes or reflection.
Having said so: Be wary of doing this
You'll need to take care of any of these things (which is probably impossible):
What happens when you self-join a table?
How do you know whether to inner or outer join two tables?
How do you know if you can reuse a table that is already in the FROM clause, or if you have to join it again?
How do you know the "optimal" path from table A to table B if there are several possible ways to join them via foreign key?

Related

Hibernate Search query for class

I'm using hibernate search 4.4.0. And I met a problem recently.
E.g, I have 2 classes INDEXING and DATA_PROPERTY. There is no association between 2 of them. And I can't change them or creat a new class to associate 2 of them.
Part of Lucene indexing:
mapping.entity(DatatypeProperty.class).indexed().providedId()
.property("rdfResource",ElementType.FIELD).field().analyze(Analyze.NO).store(Store.YES)
.property("partitionValue", ElementType.FIELD).field().analyze(Analyze.NO)
mapping.entity(Indexing.class).indexed().providedId()
.property("rdfResource",ElementType.FIELD).field().analyze(Analyze.NO).store(Store.YES)
Now in the SQL, I use
SELECT IND.RDF_RESOURCE
FROM INDEXING IND, DATA_PROPERTY DP
WHERE IND.RDF_RESOURCE = DP.RDF_RESOURCE
AND IND.OBJECT_TYPE_ID_INDEXED IN (........)
AND DP.PARTITION_VALUE IN (......)
AND .......
How can I translate IND.RDF_RESOURCE = DP.RDF_RESOURCE in Hibernate Search???
I thought maybe I can use the query to find all the RDF_RESOURCE of class DatatypeProperty and matching all of them in the query for class Indexing. But it seems very inefficiency.
Does anyone has a better way for this?

I have 2 classes INDEXING and DATA_PROPERTY. There is no association
between 2 of them. And I can't change them or create a new class to
associate 2 of them.
In this case you are between a rock and a hard place. You will need to associate the records somehow and the most obvious choice is via an association. Also, you cannot compare a SQL join with a free text based index provided by Lucene.
One potential solution could be to write a custom bridge which at indexing time executes the join and indexes the relevant data, so that you can target it directly via your query. Whether this works for you will depend on your use case. In your example setup, I don't see any field which would benefit from free text search. I can only assume that you are only showing parts of your code. If not, why don't you just stick with SQL?

Hibernate produce different SQL for every query

I've just tested my application under the profiler and found out that sql strings use about 30% of my memory! This is bizarre.
There are a lot of strings like this stored in app memory. This is SQL queries generated by hibernate, note the different numbers and trailing underscores:
select avatardata0_.Id as Id4305_0_,...... where avatardata0_.Id=? for update
select avatardata0_.Id as Id4347_0_,...... where avatardata0_.Id=? for update
Here is the part I can't understand. Why does hibernate have to generate different sql strings with different identifiers like "Id4305_0_" for each query? Why can't it use one query string for all identical queries? Is this some kind of trick to bypass query caching?
I would greatly appreciate if someone would describe me why it happening and how to avoid such resource wasting.
UPDATE
Ok. I found it. I was wrong assuming memory leak, It was my fault. Hibernate is working as intended.
My app created 121(!) SessionFactories in 10 threads, they produced about 2300 instances of SingleTableEntityPersisters. And each SingleTableEntityPersister generates about 15 SQL queries with different identifiers. Hibernate was forced to generate about 345.000 different SQL queries. Everything is fine, nothing weird :)

There is a logic behind the query string that hibernate generates. Its primary aim is to get unique aliases for tables and columns names.
From your query,
select avatardata0_.Id as Id4305_0_,...... where avatardata0_.Id=?
avatardata0_ ==> avatardata is the alias of the table and 0_ is appended to indicate it is the first table in the query. So if it were the second table(or Entity) in the query it should have been shown as avatardata1_. It uses the same logic for the column aliases.
So, this way all the possible conflicts are avoided.
You are seeing theses queries because you have turns on the show_sql flag the configuration. This is intended for the debugging of queries. Once you application started working you are supposed turn it off.
Read more on the API docs here.
I am not much aware of the memory consumption part, but you repeat your tests with the above flag turned off and see if there is any improvement.

Assuming you are using sql server, you might want to check the parameter type declaration for '?', making sure the declaration results in the same, fixed length declaration every time.
Dynamic length parameters would result in separate execution plans for each query. This could possibly comsume a lot of resources. What we see as the same procedure, get's interpreted by sql server as a different query, rendering a separate execution plan.
Thus,
exec myprocedure #p1 varchar(3)='foo'
and
exec myprocedure #p1 varchar(6)='foobar'
would result in different plans. Simply by the fact that the declarations of #p1, differ in size.
There is a lot to know about this behaviour. If the above applies to you, I would recommend you read up on 'parameter sniffing'.

No... you can generate you common query inside the hibernate. The logic behind is to mapping with table and fetch the record from there. It is used common query for all the database. Please create a common query like that :
Example :
select t.Id as Id4305_0_,...... from t where t.Id=?

Avoiding N+One selects and Invalid results from eclipselink with batch read

I'm trying to cut down the number of n+1 selects incurred by my application, the application uses EclipseLink as an ORM and in as many places as possible I've tried to add the batch read hint to queries. In a large number of places in the app I don't always know exactly what relationships I'll be traversing (My view displays fields based on user preferences). At that point I'd like to run one query to populate all of those relationships for my objects.
My dream is to call something like ReadAllRelationshipsQuery(Collection,RelationshipName) and populate all of these items so that later calls to:
Collection.get(0).getMyStuff will already be populated and not cause a db query. How can I accomplish this? I'm willing to write any code I need to but I can't find a way that work with the eclipselink framework?
Why don't I just batch read all of the possible fields and let them load lazily? What I've found is that the batch value holders that implement batch reads don't behave well with the eclipselink cache. If a batch read value holder isn't "evaluated" and ends up in the eclipse link cache it can become stale and return incorrect data (This behavior was logged as an eclipselink bug but rejected...)
edit: I found the link to the bug here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=326197
How do I avoid N+1 selects for objects I already have a reference to?

You have three basic ways to load data into objects from a JPA-based solution. These are:
Load dynamically by object traversal (e.g. myObject.getMyCollection().get()).
Load graphs of objects by prefetching dynamically using JPA QL (e.g. FETCH JOINs as described at the Oracle JPA tutorial )
Load by setting the fetch mode ( Is there a way to change the JPA fetch type on a method? )
Each of these has pros and cons.
Loading dynamically by object transversal will generate more (highly targeted queries). These queries are usually small (not large SQL statements, but may load lots of data) and tend to play nicely with a second level cache, but you can get lots and lots of little queries.
Prefetching with JPA QL will give you exactly what you want, but that assumes that you know what you want.
Setting the fetch mode to EAGER will load lots and lots of data for you automatically, but depending on the configuration and usage this may not actually help much (or may make things a lot worse) as you may wind up dragging a LOT of data from the DB into your app that you didn't expect.
Regardless, I highly recommend using p6spy ( http://sourceforge.net/projects/p6spy/ ) in conjunction with any JPA-based application to understand the effects of your tuning.
Unfortunately, JPA makes some things easy and some things hard - mainly, side-effects of your usage. For example, you might fix one problem by setting the fetch mode to eager, and then create another problem where the eager fetch pulls in too much data. EclipseLink does provide tooling to help sort this out ( EclipseLink Performance Tools )
In theory, if you wanted to you could write a generic JavaBean property walker by using something like Apache BeanUtils. Usually just calling a method like size() on a collection is enough to force it to load (although using a collection batch fetch size might complicate things a bit).
One thing to pay particular attention to is the scope of your session and your use of caches (EclipseLink cache).
Something not clear from your post is the scope of a session. Is a session a one shot affair (e.g. like a web page request) or is it a long running thing (e.g. like a classic client/server GUI app)?

It is very difficult to optimize the retrieval of relationships if you do not know what relationships you require.
If you application is requesting what relationships it wants, then you must know at some level which relationships you require, and should be able to optimize these in your query for the objects.
For an overview of relationship optimization techniques see,
http://java-persistence-performance.blogspot.com/2010/08/batch-fetching-optimizing-object-graph.html
For Batch Fetching, there are three types, JOIN, EXISTS, and IN. The problem you outlined of changes to data affecting the original query for cache batched relationships only applies to JOIN and EXISTS, and only when you have a selection criteria based on updateale fields, (if the query you are optimizing is on id, or all instances you are ok). IN batch fetching does not have this issue, so you can use IN batch fetching for all the relationships and not have this issue.
ReadAllRelationshipsQuery(Collection,RelationshipName)
How about,
Query query = em.createQuery("Select o from MyObject o where o.id in :ids");
query.setParameter(ids, ids);
query.setHint("eclipselink.batch", relationship);

If you know all possible relations and the user preferences, why don't you just dynamically build the JPQL string (or Criteria) before executing it?
Like:
String sql = "SELECT u FROM User u"; //use a StringBuilder, this is just for simplity's sake
if(loadAdress)
{
sql += " LEFT OUTER JOIN u.address as a"; //fetch join and left outer join have the same result in many cases, except that with left outer join you could load associations of address as well
}
...
Edit: Since the result would be a cross product, you should then iterate over the entities and remove duplicates.

In the query, use FETCH JOIN to prefetch relationships.
Keep in mind that the resulting rows will be the cross product of all rows selected, which can easily be more work than the N+1 queries.

Hibernate: Sorting a list without 'touching' the database

I've got a many-to-many relationship in my database and I'm using Hibernate to retrieve a single row from the left hand side of the relationship. I then just call the getter method to retrieve the associated right hand side of the relationship (lazy fetch).
As part of my work, I need to sort the right hand side "list" object by doing this:
Collections.sort(list);
When I'm done my work, I'm currently calling:
session.getTransaction().commitTransaction();
Even though I haven't actually changed anything in the database in terms of the data, I can see in the logs that some INSERT statements were triggered.
What should I be doing in this situation so that I could order the list without incurring a database hit?

There are few ways to do it.
First, you can use #OrderBy annotation on your collection definition. Refer to the link here. In this way, Hibernate will append the query with order by clause.
Second, if you are using Criteria API to query the DB, you can use addOrder() method to add the order by clause.
Third, you can directly use order by clause in your HQL.
Fourth, you can do it in your code, using Comparator. You can do it even after closing the session/transaction. In this case, you don't need an active transaction; but this could mean sorting partial data, because most of the time we don't fetch complete thing -- for performance reasons. In your case, it seems something has changed in the midst. However, try sorting it after closing the transaction.

Is HibernateCallback best for executing SQL/procedures?

I'm working on a web based application that belongs to an automobil manufacturer, developed in Spring-Hibernate with MS SQL Server 2005 database.
There are three kind of use cases:
1) Through this application, end users can request for creating a Car, Bus, Truck etc through web based interfaces. When a user logs in, a HTML form gets displayed for capturing technical specification of vehicle, for ex, if someone wanted to request for Car, he can speify the Engine Make/Model, Tire, Chassis details etc and submit the form. I'm using Hibernate here for persistence, i.e. I've a Car Entity that gets saved in DB for each such request.
2) This part of the application deals with generation of reports. These reports mainly dela with number of requests received in a day and the summary. Some of the reports calculate Turnaround time for individual Create vehicle requests.
I'm using plain JDBC calls with Preparedstatement (if report can be generated with SQLs), Callablestatement (if report is complex enough and needs a DB procedure/Function to fetch all details) and HibernateCallback to execute the SQLs/Procedures and display information on screen.
3) Search: This part of application allows ensd users to search for various requests data, i.e. how many vehicle have been requested in a Year etc. I'm using DB procedure with CallableStatement..Once again executing these procedures within HibernateCallback, populating and returning search result on GUI in a POJO.
I'm using native SQL in (2) and (3) above, because for the reporting/search purpose the report data structure to display on screen is not matching with any of my Entity. For ex: Car entity has got more than 100 attributes in itself, but for reporting purpose I don't need more than 10 of them.. so i just though loading all 100 attributes does not make any sense, so why not use plain SQL and retrieve just the data needed for displaying on screen.
Similarly for Search, I had to write procedures/Functions because search algorithm is not straight forward and Hibernate has no way to write a stored procedure kind of thing.
This is working fine for proto type, however I would like to know
a. If my approach for using native SQLs and DB procedures are fine for case 2 and 3 based on my judgement.
b. Also whether executing SQLs in HibernateCallback is correct approach?
Need expert's help.

I would like to know (...) if my approach for using native SQLs and DB procedures are fine for case 2 and 3 based on my judgment
Nothing forces your to use a stored procedure for case 2, you could use HQL and projections as already pointed out:
select f.id, f.firstName from Foo f where ...
Which would return an Object[] or a List<Object[]> depending on the where condition.
And if you want type safe results, you could use a SELECT NEW expression (assuming you're providing the appropriate constructor):
select new Foo(f.id, f.firstName) from Foo f
And you can even return non entities
select new com.acme.LigthFoo(f.id, f.firstName) from Foo f
For case 3, the situation seems different. Just in case, note that the Criteria API is more appropriate than HQL to build dynamic queries. But it looks like this won't help here.
I would like to know (...) whether executing SQLs in HibernateCallback is correct approach?
First of all, there are several restrictions when using stored procedures and I prefer to avoid them when possible. Secondly, if you want to return entities, it isn't the only way and simplest solution as we saw. So for case 2, I would consider using HQL.
For case 3, since you aren't returning entities at all, I would consider not using Hibernate API but the JDBC support from Spring which offers IMHO a cleaner API than Session#connection() and the HibernateCallback.
More interesting readings:
References
Hibernate Core reference guide
14.6. The select clause (about the select new)
16.1.5. Returning non-managed entities (about ResultTransformer)
16.2.2. Using stored procedures for querying
Resources
Hibernate 3.2: Transformers for HQL and SQL
Related questions
hibernate SQLquery extract variable
hibernate query language or using criteria

You should strive to use as much HQL as possible, unless you have a good argument (like performance, but do a benchmark first). If the use of native queries becomes to excessive, you should consider whether Hibernate has been a good choice.
Note a few things:
you can have native queries and stored procedures that result in Hibernate entities. You just have to map the query / storproc call to a class and call it by session.createSQLQuery(queryName)
If you really need to construct native queries at runtime, the newest version of hibernate have a doWork(..) method, by which you can do JDBC work.

You say
For ex: Car entity has got more than 100 attributes in itself, but for reporting purpose I don't need more than 10 of them.. so i just though loading all 100 attributes does not make any sense
but HQL in hibernate allows you to do a projection (select only a subset of the columns back). You don't have to pull the entire entity if you don't want to.
Then you get all the benefits of HQL (typing of results, HQL join syntax) but you can pretty much write SQLish code.
See here for the HQL docs and here for the select syntax. If you're used to SQL it's pretty easy.
So to answer you directly
a - No, I think you should be using HQL
b - Becomes irrelevant if you go with my suggestion for a.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.