Use existing JPAQuery as subquery - java

I have an instance of JPAQuery<?> and need to retrieve the count. However, since the table may contain many items (millions), I want to limit the count to a given maximum, say 50,000.
The current QueryDSL-Code effectively does this:
query.fetchCount();
Now my desired modifications are quite trivial in raw sql:
select count(*) from (<whatever query> limit 50000);
However, I do not know how I would express this in querydsl. The following code is not correct, because .from() takes an entity path, but query is a query:
JPAExpressions.select(Wildcard.all)
.from(query.limit(50000))
.fetchCount();
I am using querydsl 4.

JPAExpressions.select(Wildcard.all) returns a child of SimplyQuery, which you can call limit on.
JPAExpressions.select(Wildcard.all)
.from(entity)
.limit(50000)
.fetchCount();

Related

How to use Querydsl to construct complex predicate that involves multiple tables?

I am trying to utilize Querydsl to fetch some results from a table. So far, this is what I have tried -
Assume there are 5 entities named T1..T5. And I am trying to do this SQL query in Querydsl -
SELECT T1.*
FROM T1,T2,T3,T4,T5
WHERE T1.A=T2.A
AND T2.B=T5.B
AND T4.C=T2.C
AND T1.B=1234;
I tried the following, but the Hibernate query keeps running, and does not seem to end.
booleanBuilder.and(JPAExpressions.select(qT1).from(qT1,qT2,qT3,qT4,qT5)
.where(
qT1.a.eq(qT2.a)
.and(qT1.a.eq(qT2.a))
... // and so on
.exists());
I am using the Repository that extends QuerydslPredicateExecutor and using findAll to execute this. The problem is that the query takes forever to run. And I am interested only in the first result that may appear.
So, where am I going wrong that is making the query execute forever?
Edit:
I opted to use the JPAQuery instead. And of course, the Hibernate query generated is the same. Here is my JPAQuery.
JPQLQuery jpqlQuery = new JPAQuery(entityManager);
jpqlQuery.select(qT1).from(qT1, qT2, qT3, qT4, qT5).where(booleanBuilder);
return jpqlQuery.fetch();
How do I incorporate the limit in the above JPAQuery so that only the first result is fetched?
The complexity is not in the predicate or in QueryDSL, but in the fact that you're executing it in a subquery that has to be executed for every row in the result. Depending on the total result set size, this may become increasingly difficult to compute. It is however equally complex among QueryDSL, Hibernates HQL, JPA's JPQL or your databases SQL. So the SQL you're trying to generate, will be just as slow.
You might succeed at optimising the query using a limit clause. Adding a limit clause to query in QueryDSL is quite trivial: .limit(1). So then your query becomes:
JPQLQuery jpqlQuery = new JPAQuery(entityManager);
jpqlQuery.select(qT1).from(qT1, qT2, qT3, qT4, qT5).where(booleanBuilder);
jpqlQuery.limit(1);
return jpqlQuery.fetch();

What is the limit of hibernate in clause

we know hibernate has this in clause:
Criteria criteria = session.createCriteria(User.class);
criteria.add(Restrictions.in(userIds));
Is there any limit on the size of userIds (which is an ArrayList, say)?
Thanks
It actually depends on the particular database you use. For example in Oracle this limit is 1000.
If you need to pass more values you need to use another approach. For example put the values into a temporary table and then do a select where id in (select id from temptable) query.

Better to query once, then organize objects based on returned column value, or query twice with different conditions?

I have a table which I need to query, then organize the returned objects into two different lists based on a column value. I can either query the table once, retrieving the column by which I would differentiate the objects and arrange them by looping through the result set, or I can query twice with two different conditions and avoid the sorting process. Which method is generally better practice?
MY_TABLE
NAME AGE TYPE
John 25 A
Sarah 30 B
Rick 22 A
Susan 43 B
Either SELECT * FROM MY_TABLE, then sort in code based on returned types, or
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'A' followed by
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'B'
Logically, a DB query from a Java code will be more expensive than a loop within the code because querying the DB involves several steps such as connecting to DB, creating the SQL query, firing the query and getting the results back.
Besides, something can go wrong between firing the first and second query.
With an optimized single query and looping with the code, you can save a lot of time than firing two queries.
In your case, you can sort in the query itself if it helps:
SELECT * FROM MY_TABLE ORDER BY TYPE
In future if there are more types added to your table, you need not fire an additional query to retrieve it.
It is heavily dependant on the context. If each list is really huge, I would let the database to the hard part of the job with 2 queries. At the opposite, in a web application using a farm of application servers and a central database I would use one single query.
For the general use case, IMHO, I will save database resource because it is a current point of congestion and use only only query.
The only objective argument I can find is that the splitting of the list occurs in memory with a hyper simple algorithm and in a single JVM, where each query requires a bit of initialization and may involve disk access or loading of index pages.
In general, one query performs better.
Also, with issuing two queries you can potentially get inconsistent results (which may be fixed with higher transaction isolation level though ).
In any case I believe you still need to iterate through resultset (either directly or by using framework's methods that return collections).
From the database point of view, you optimally have exactly one statement that fetches exactly everything you need and nothing else. Therefore, your first option is better. But don't generalize that answer in way that makes you query more data than needed. It's a common mistake for beginners to select all rows from a table (no where clause) and do the filtering in code instead of letting the database do its job.
It also depends on your dataset volume, for instance if you have a large data set, doing a select * without any condition might take some time, but if you have an index on your 'TYPE' column, then adding a where clause will reduce the time taken to execute the query. If you are dealing with a small data set, then doing a select * followed with your logic in the java code is a better approach
There are four main bottlenecks involved in querying a database.
The query itself - how long the query takes to execute on the server depends on indexes, table sizes etc.
The data volume of the results - there could be hundreds of columns or huge fields and all this data must be serialised and transported across the network to your client.
The processing of the data - java must walk the query results gathering the data it wants.
Maintaining the query - it takes manpower to maintain queries, simple ones cost little but complex ones can be a nightmare.
By careful consideration it should be possible to work out a balance between all four of these factors - it is unlikely that you will get the right answer without doing so.
You can query by two conditions:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B'
This will do both for you at once, and if you want them sorted, you could do the same, but just add an order by keyword:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B' ORDER BY TYPE ASC
This will sort the results by type, in ascending order.
EDIT:
I didn't notice that originally you wanted two different lists. In that case, you could just do this query, and then find the index where the type changes from 'A' to 'B' and copy the data into two arrays.

JOOQ How to select the min 'id' from a table

In mysql I want to execute a query like this
SELECT MIN(id) FROM table;
The more I read about JOOQ syntax and the aggregate functions, the confused I get.
I thought something like this would work
select( EVENT.EVENTID , min() ).from( EVENT ).fetch();
or
Result<Integer> er = context.select( EVENT.EVENTID.min()).fetch();
I tried a work around by selecting the whole first record
Result<EventRecord> er2 = context.selectFrom(EVENT).orderBy(EVENT.EVENTID.asc()).limit(1).fetch();
If the result has size 0, a record does not exist, but when it is not 0 I get the right record. I would like to use the min() function but can't get the syntax right.
The query you want to write in SQL is this one:
SELECT MIN(event.eventid) FROM event
This is why your two attempts didn't work
// 1. You cannot combine single columns with aggregate functions in SQL,
// unless you're grouping by those columns
// 2. You didn't pass any cargument column to the MIN() function
context.select( EVENT.EVENTID , min() ).from( EVENT ).fetch();
// 3. This doesn't specify any FROM clause, so your database won't know what
// table you want to select the MIN(eventid) from
context.select( EVENT.EVENTID.min()).fetch();
Note that these thoughts are not specific to jOOQ, they are related to SQL in general. When using jOOQ, always think of the SQL statement you want to express first (the one at the top of my answer). So your jOOQ statement would look like any of these:
// "Postfix notation" for MIN()
context.select(EVENT.EVENTID.min()).from(EVENT).fetch();
// "Prefix notation" for MIN(), where min() is static-imported from
// org.jooq.impl.DSL
context.select(min(EVENT.EVENTID)).from(EVENT).fetch();
It looks like the fetchAny() method will return the record with the first/lowest record id.
EventRecord record = context.selectFrom(EVENT).fetchAny();
As #LukasEder mentioned there are many alternative methods, and he may be generous and follow up on some of those. Thanks Lucas

How to order by count() in JPA

I am using This JPA-Query:
SELECT DISTINCT e.label FROM Entity e
GROUP BY e.label
ORDER BY COUNT(e.label) DESC
I get no errors and the results are sorted almost correct but there are some values wrong (either two values are flipped or some single values are completly misplaced)
EDIT:
Adding COUNT(e.label) to my SELECT clause resolves this problem for this query.
But in a similar query which also contains a WHERE clause the problem persists:
SELECT DISTINCT e.label, COUNT(e.label) FROM Entity e
WHERE TYPE(e.cat) = :category
GROUP BY e.label
ORDER BY COUNT(e.label) DESC
You might need to include the COUNT(e.label) in your SELECT clause:
SELECT DISTINCT e.label, COUNT(e.label)
FROM Entity e
GROUP BY e.label
ORDER BY COUNT(e.label) DESC
UPDATE: Regarding the second query please read section 8.6. Polymorphic queries of the EntityManager documentation. It seems that if you make your queries in a way that requires multiple SELECTs, then the ORDER BY won't work anymore. Using the TYPE keyword seems to be such a case. A quote from the above link:
The following query would return all persistent objects:
from java.lang.Object o // HQL only
The interface Named might be implemented by various persistent classes:
from Named n, Named m where n.name = m.name // HQL only
Note that these last two queries will require more than one SQL SELECT. This means that the order by clause does not correctly order the whole result set. (It also means you can't call these queries using Query.scroll().)
For whatever reason the following style named query didn't work for me:
SELECT DISTINCT e.label, COUNT(e.label)
FROM Entity e
GROUP BY e.label
ORDER BY COUNT(e.label) DESC
It could be because I am using an old version of Hibernate. I got the order by working by using a number to choose the column to sort by like this:
SELECT DISTINCT e.label, COUNT(e.label)
FROM Entity e
GROUP BY e.label
ORDER BY 2 DESC
Can't see how the order could be incorrect. What is the incorrect result?
What is the SQL that is generated, if you try the same SQL directly on the database, does it give the same incorrect order?
What database are you using?
You could always sort in Java instead using sort().

Categories