Optimize JPA dinamyc count query - java

Having the typical method which returns a paginated result, using CriteriaBuilder and performing 2 queries:
one that counts the total number of results
and another one that gives us the subset for the specified page
We have noticed that the first query, JPA does not optimize it at all because it's using the exists (from Oracle).
Java code:
Root<Foo> from = criteriaQuery.from(Foo.class);
//... predicates
CriteriaQuery<Long> countQuery = criteriaBuilder.createQuery(Long.class)
.select(criteriaBuilder.countDistinct(from))
.where(predicates.toArray(new Predicate[predicates.size()]));
Long numberResults = entityManager.createQuery(countQuery).getSingleResult();
SQL generated query:
SELECT COUNT(t0.REFERENCE)
FROM foo t0
WHERE EXISTS (
SELECT t1.REFERENCE
FROM foo t1
WHERE ((((t0.REFERENCE = t1.REFERENCE) AND (t0.VERSION_NUM = t1.VERSION_NUM)) AND (t0.ISSUER = t1.ISSUER)) AND (t1.REFERENCE LIKE ? AND (t1.VERSION_STATUS = ?)))
);
How do I avoid using the exists? Is there something wrong with the java code?

For different reasons, this issue and this related article enumerate some of them, EclipseLink uses EXISTS in the countDistinct operation implementation.
Although I can agree with you, be aware that the performance offered by EXISTS in Oracle is in fact very dependent of the use case, and it doesn't have to be poor. Please, consider review this mythical blog entry in the Tom Kyte blob.
So my advice is, please, keep using the generated code and corresponding SQL.
If you need or want to use a different approach, a perhaps more performant way of counting the records could be fetching the ids of the entities that match the provided predicates (the actual performance in fact is mostly dependent on these predicates in fact), and count the results in memory, with Java. I mean:
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
// I assume reference is String here
CriteriaQuery<String> query = cb.createQuery(String.class);
Root<Foo> root = query.from(Foo.class);
query
.select(root.get("reference"))
.distinct(true)
.where(predicates.toArray(new Predicate[predicates.size()]))
;
List<String> references = entityManager.createQuery(query).getResultList();
int count = references.size();
Although I think it is always not advisable, if the amount of data is not large, you could even fetch the results once from the database, and do the paging in memory with Java, it is straightforward using subList, for instance.
At a final word, AFAIK other JPA providers such as Hibernate implements count in a different way: if switching the JPA provider is an option you could try using it instead.

With or without EXISTS, the query plans are identical. The only optimisation would be to return COUNT() and the result in the same query, easy to do in SQL with "OVER()". But mapping the Foo.class on a view and adding a transient column to contain the count will complicate a lot of other parts of the application, and mapping the result of paginated queries on a new CountedFoo.class will also complicate the solution.

Related

JOOQ multiple select count in one connection with PostgreSQL

I have a table SUBSCRIPTION, I want to run multiple selectCount written with JOOQ in one connection with different predicates to the database.
To do so, I have created a list of queries:
List<Query> countQueries = channels.stream().map(c ->
selectCount().from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.equal(c))
).collect(toList());
And finally, I have launched this list of queries using batch:
using(configuration).batch(countQueries).execute();
I have expected to have the results of the above queries in the return values of execute, but I get an array of integer filled with 0 values.
Is this the right way to run multiple selectCount using JOOQ?
What is the signification of the integer array returned by the execute method?
I have checked this link, in the JOOQ blog, talking about "How to Calculate Multiple Aggregate Functions in a Single Query", but It's just about SQL queries, no JOOQ dialects.
Comments on your assumptions
I have expected to have the results of the above queries in the return values of execute, but I get an array of integer filled with 0 values.
The batch() API can only be used for DML queries (INSERT, UPDATE, DELETE), just like with native JDBC. I mean, you can run the queries as a batch, but you cannot fetch the results this way.
I have checked this link, in the JOOQ blog, talking about "How to Calculate Multiple Aggregate Functions in a Single Query", but It's just about SQL queries, no JOOQ dialects.
Plain SQL queries almost always translate quite literally to jOOQ, so you can apply the technique from that article also in your case. In fact, you should! Running so many queries is definitely not a good idea.
Translating that linked query to jOOQ
So, let's look at how to translate that plain SQL example from the link to your case:
Record record =
ctx.select(
channels.stream()
.map(c -> count().filterWhere(CHANNEL.CODE.equal(c)).as(c))
.collect(toList())
)
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels)) // Not strictly necessary, but might speed up things
.fetch();
This will produce a single record containing all the count values.
As always, this is assuming the following static import
import static org.jooq.impl.DSL.*;
Using classic GROUP BY
Of course, you can also just use a classic GROUP BY in your particular case. This might even be a bit faster:
Result<?> result =
ctx.select(CHANNEL.CODE, count())
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels)) // This time, you need to filter
.groupBy(CHANNEL.CODE)
.fetchOne();
This now produces a table with one count value per code. Alternatively, fetch this into a Map<String, Integer>:
Map<String, Integer> map =
ctx.select(CHANNEL.CODE, count())
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels))
.groupBy(CHANNEL.CODE)
.fetchMap(CHANNEL.CODE, count());

JPA strange behavior when using SELECT

I am new to Java and try developing a SWing app for library using JPA controller generated.
When I try to select result from sql server database, I use this command
CriteriaBuilder criteriaBuilder = em.getCriteriaBuilder();
CriteriaQuery<BookTitles> cq = criteriaBuilder.createQuery(BookTitles.class);
cq.select(cq.from(BookTitles.class)).where(criteriaBuilder.isNull(cq.from(BookTitles.class).get("status")));
This command, however, returns 9 times of rows in db. For example, if db has 10 rows, it will repeat this 10 rows around 9 times and return a list with 90 elements.
Instead of this code, I changed to
CriteriaBuilder criteriaBuilder = em.getCriteriaBuilder();
CriteriaQuery<BookTitles> cq = criteriaBuilder.createQuery(BookTitles.class);
Root<BookTitles> root = cq.from(BookTitles.class);
cq.select(root).where(criteriaBuilder.isNull(root.get("status")));
and the results will be the same as listed in db.
The only different between these two codes is that instead of passing cq.from(...) directly to select(), I pass result of cq.from(...).
Personally, I donot think there is any differences between these two ways of coding, but the results tell the other way.
Can someone take time to explain?
It's not strange behavior
By using the CriteriaBuilder method twice, you are setting two tables in that clause for what the Cartesian product does.
As you can see in the documentation
https://docs.oracle.com/javaee/7/api/javax/persistence/criteria/AbstractQuery.html#from-java.lang.Class-
"Create and add a query root corresponding to the given entity, forming a cartesian product with any existing roots."
So the correct way is the second one, storing the table that forms the from clause in a variable, and using this instead of adding more tables to the from clause with the criteriaquery from method.

How to use Querydsl to construct complex predicate that involves multiple tables?

I am trying to utilize Querydsl to fetch some results from a table. So far, this is what I have tried -
Assume there are 5 entities named T1..T5. And I am trying to do this SQL query in Querydsl -
SELECT T1.*
FROM T1,T2,T3,T4,T5
WHERE T1.A=T2.A
AND T2.B=T5.B
AND T4.C=T2.C
AND T1.B=1234;
I tried the following, but the Hibernate query keeps running, and does not seem to end.
booleanBuilder.and(JPAExpressions.select(qT1).from(qT1,qT2,qT3,qT4,qT5)
.where(
qT1.a.eq(qT2.a)
.and(qT1.a.eq(qT2.a))
... // and so on
.exists());
I am using the Repository that extends QuerydslPredicateExecutor and using findAll to execute this. The problem is that the query takes forever to run. And I am interested only in the first result that may appear.
So, where am I going wrong that is making the query execute forever?
Edit:
I opted to use the JPAQuery instead. And of course, the Hibernate query generated is the same. Here is my JPAQuery.
JPQLQuery jpqlQuery = new JPAQuery(entityManager);
jpqlQuery.select(qT1).from(qT1, qT2, qT3, qT4, qT5).where(booleanBuilder);
return jpqlQuery.fetch();
How do I incorporate the limit in the above JPAQuery so that only the first result is fetched?
The complexity is not in the predicate or in QueryDSL, but in the fact that you're executing it in a subquery that has to be executed for every row in the result. Depending on the total result set size, this may become increasingly difficult to compute. It is however equally complex among QueryDSL, Hibernates HQL, JPA's JPQL or your databases SQL. So the SQL you're trying to generate, will be just as slow.
You might succeed at optimising the query using a limit clause. Adding a limit clause to query in QueryDSL is quite trivial: .limit(1). So then your query becomes:
JPQLQuery jpqlQuery = new JPAQuery(entityManager);
jpqlQuery.select(qT1).from(qT1, qT2, qT3, qT4, qT5).where(booleanBuilder);
jpqlQuery.limit(1);
return jpqlQuery.fetch();

Parse sql query using antlr parsetree to mongo bson document in Java

I have a SQL like query example:
Select id,name from employee where age > 30 and department = 'IT' limit 200
The SQL query grammer is defined in an ANTLR4 grammar file. Is there any implementation that converts the parse tree of this query to a bson document?
The bson document will then be used to query a mongo db.
In one of my previous jobs I did something similar: got a query (not an sql, but pretty similar) and translated it to mongo query with antlr.
I don't have a code to share, However I can share my thoughts:
Mongo is not SQL compliant, so you can't just take a sql grammar. What about JOINs and all the relational algebra? What about aggregations that are pretty tricky in mongo with their aggregation framework? In the opposite direction, how do you generate SQL that gets translated to "exists" clause in mongo. There are many things like this, some are small, some are huge, but bottom line you must be talking about some kind of subset of sql ,some DSL that is allowed to be used as a query language and looks "like" an sql because people are used to SQL.
With that in mind, you should create your own grammar and Antlr will generate a lexer/parser for you. You'll also get for granted a syntax check of the query in Runtime. Antlr won't be able to parse the query if its not in a correct format obviously, some grammar rule will fail. This is an another reason to not take SQL "as is".
So far so good, you've created your own listener / visitor. In my case I've opted for creating an object representation of the query with internal state and everything.
So the query
Select id,name
from employee
where age > 30
and department = 'IT'
limit 200
Was translated to objects of type:
class Query {
private SelectClause select;
private FromClause from;
private WhereClause where;
private Limit limit;
}
class SelectClause {
private List<String> fields;
}
...
class WhereClause {
Condition root;
}
interface Condition {
...
}
class AndCondition implements Condition { // the same for Not, Or
}
For this particular query its something like:
Query q = new Query(new SelectClause(["id", "name"]), new FromClause("employee"), new WhereClause(new AndCondition(new SimpleLeafCondition("age", Operators.GT, 30), new SimpleLeafCondition("department", Operators.EQ, "IT" )), new Limit(30));
Then Its possible to make some optimizations in the query (like embedding of where clauses if you need, or, for example, manipulating the "For" part if you're working multi tenant environment and have different collections for different tenants).
After all you can go with design pattern "interpreter" and recursively parse the query objects and "translate" them to valid mongo query.
I remember that this step took me something like 1 day to accomplish (it was 7 years ago with mongo 2 I guess, but still), given the correct structure of objects representing the query, so this should not be that complicated. I'm bringing this up, because It looks like its your primary concern in the question.

Runtime SQL Query Builder

My question is similar to
Is there any good dynamic SQL builder library in Java?
However one important point taken from above thread:
Querydsl and jOOQ seem to be the most popular and mature choices however there's one thing to be aware of: Both rely on the concept of code generation, where meta classes are generated for database tables and fields. This facilitates a nice, clean DSL but it faces a problem when trying to create queries for databases that are only known at runtime.
Is there any way to create the queries at runtime besides just using plain JDBC + String concatenation?
What I'm looking for is a web application that can be used to build forms to query existing databases. Now if something like that already exists links to such a product would be welcome too.
While source code generation for database meta data certainly adds much value to using jOOQ, it is not a prerequisite. Many jOOQ users use jOOQ for the same use-case that you envision. This is also reflected in the jOOQ tutorials, which list using jOOQ without code generation as a perfectly valid use-case. For example:
String sql = create.select(
fieldByName("BOOK","TITLE"),
fieldByName("AUTHOR","FIRST_NAME"),
fieldByName("AUTHOR","LAST_NAME"))
.from(tableByName("BOOK"))
.join(tableByName("AUTHOR"))
.on(fieldByName("BOOK", "AUTHOR_ID").eq(
fieldByName("AUTHOR", "ID")))
.where(fieldByName("BOOK", "PUBLISHED_IN").eq(1948))
.getSQL();
In a similar fashion, bind values can be extracted from any Query using Query.getBindValues().
This approach will still beat plain JDBC + String concatenation for dynamic SQL statements, as you do not need to worry about:
Syntax correctness
Cross-database compatibility
SQL Injection
Bind variable indexing
(Disclaimer: I work for the vendor of jOOQ)
SQLBuilder http://openhms.sourceforge.net/sqlbuilder/ is very useful for me.
Some simple examples:
String query1 = new InsertQuery("table1")
.addCustomColumn("s01", "12")
.addCustomColumn("stolbez", 19)
.addCustomColumn("FIRSTNAME", "Alexander")
.addCustomColumn("LASTNAME", "Ivanov")
.toString();
String query2 = new UpdateQuery("table2")
.addCustomSetClause("id", 1)
.addCustomSetClause("FIRSTNAME", "Alexander")
.addCustomSetClause("LASTNAME", "Ivanov")
.toString();
Results:
INSERT INTO table1 (s01,stolbez,FIRSTNAME,LASTNAME) VALUES ('12',19,'Alexander','Ivanov')
UPDATE table2 SET id = 1,FIRSTNAME = 'Alexander',LASTNAME = 'Ivanov'
I have a custom solution for dynamically generating such SQL queries with just 2-3 classes for similar requirement. It is a simple approch.
This can be referred at Creating Dynamic SQL queries in Java
For simpler use cases like a dynamic filter condition based on the inputs selected from UI, one can use the below simpler approach by directly modifying the query in below style:
select t1.id, t1.col1, t1.col2,
from table1 t1
where (:col1Value is null or t1.col1 = :col1Value)
and (:col2Value is null or t1.col2 = :col2Value);
Here values for col1 or col2 can be null but the query will work fine.

Categories