JOOQ multiple select count in one connection with PostgreSQL - java

I have a table SUBSCRIPTION, I want to run multiple selectCount written with JOOQ in one connection with different predicates to the database.
To do so, I have created a list of queries:
List<Query> countQueries = channels.stream().map(c ->
selectCount().from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.equal(c))
).collect(toList());
And finally, I have launched this list of queries using batch:
using(configuration).batch(countQueries).execute();
I have expected to have the results of the above queries in the return values of execute, but I get an array of integer filled with 0 values.
Is this the right way to run multiple selectCount using JOOQ?
What is the signification of the integer array returned by the execute method?
I have checked this link, in the JOOQ blog, talking about "How to Calculate Multiple Aggregate Functions in a Single Query", but It's just about SQL queries, no JOOQ dialects.

Comments on your assumptions
I have expected to have the results of the above queries in the return values of execute, but I get an array of integer filled with 0 values.
The batch() API can only be used for DML queries (INSERT, UPDATE, DELETE), just like with native JDBC. I mean, you can run the queries as a batch, but you cannot fetch the results this way.
I have checked this link, in the JOOQ blog, talking about "How to Calculate Multiple Aggregate Functions in a Single Query", but It's just about SQL queries, no JOOQ dialects.
Plain SQL queries almost always translate quite literally to jOOQ, so you can apply the technique from that article also in your case. In fact, you should! Running so many queries is definitely not a good idea.
Translating that linked query to jOOQ
So, let's look at how to translate that plain SQL example from the link to your case:
Record record =
ctx.select(
channels.stream()
.map(c -> count().filterWhere(CHANNEL.CODE.equal(c)).as(c))
.collect(toList())
)
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels)) // Not strictly necessary, but might speed up things
.fetch();
This will produce a single record containing all the count values.
As always, this is assuming the following static import
import static org.jooq.impl.DSL.*;
Using classic GROUP BY
Of course, you can also just use a classic GROUP BY in your particular case. This might even be a bit faster:
Result<?> result =
ctx.select(CHANNEL.CODE, count())
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels)) // This time, you need to filter
.groupBy(CHANNEL.CODE)
.fetchOne();
This now produces a table with one count value per code. Alternatively, fetch this into a Map<String, Integer>:
Map<String, Integer> map =
ctx.select(CHANNEL.CODE, count())
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels))
.groupBy(CHANNEL.CODE)
.fetchMap(CHANNEL.CODE, count());

Related

How to sum values from a database column in AnyLogic?

as a newbie I want to sum the values of a column pv from a database table evm in my model and store it in a variable. I have tried the SQL code SELECT SUM(pv) FROM evm; but that doesn't seem to work.I would be grateful if you lend me an aid regarding how to pull this one.
You can always write a native query and get the response in the resultset to populate the field of your pojo. Once you have the POJO/DTO created as the list of result set perform your sum on the field by Iterating the list.
You do just use the SQL you have suggested. (The database in an AnyLogic model is a standard HSQLDB database which supports this SQL syntax.)
The simplest way to execute it is to use AnyLogic's in-built functions for such queries (as would be produced by the Insert Database Query wizard), so
mySumVariable = selectFirstValue("SELECT SUM(pv) FROM evm;");
You didn't say what errors you had; obviously the table and column has to exist (and the column you're summing needs to be numeric, though NULLs are OK), as does the variable you're assigning the sum to.
If you wanted to do this in a way which more easily fits one of the standard query 'forms' suggested by the wizard (i.e., not having to know particular SQL syntax) you could just adapt the "Iterate over returned rows and do something" code to 'explicitly' sum the columns; e.g., (using the Query DSL format this time):
List<Tuple> rows = selectFrom(evm).list();
for (Tuple row : rows) {
mySumVariable += row.get(evm.pv);
}

How to use SUM inside COALESCE in JOOQ

Given below is a gist of the query, which I'm able to run successfully in MySQL
SELECT a.*,
COALESCE(SUM(condition1 or condition2), 0) as countColumn
FROM table a
-- left joins with multiple tables
GROUP BY a.id;
Now, I'm trying to use it with JOOQ.
ctx.select(a.asterisk(),
coalesce(sum("How to get this ?")).as("columnCount"))
.from(a)
.leftJoin(b).on(someCondition)
.leftJoin(c).on(someCondition))
.leftJoin(d).on(someCondition)
.leftJoin(e).on(someCondition)
.groupBy(a.ID);
I'm having a hard time preparing the coalesce() part, and would really appreciate some help.
jOOQ's API is more strict about the distinction between Condition and Field<Boolean>, which means you cannot simply treat booleans as numbers as you can in MySQL. It's usually not a bad idea to be explicit about data types to prevent edge cases, so this strictness isn't necessarly a bad thing.
So, you can transform your booleans to integers as follows:
coalesce(
sum(
when(condition1.or(condition2), inline(1))
.else_(inline(0))
),
inline(0)
)
But even better than that, why not use a standard SQL FILTER clause, which can be emulated in MySQL using a COUNT(CASE ...) aggregate function:
count().filterWhere(condition1.or(condition2))

Add limit and offset to query that was created from a String

I have query as String like
select name from employee
and want to limit the number of rows with limit and offset.
Is this possible with jOOQ and how do I do that?
Something like:
dsl.fetch("select name from employee").limit(10).offset(10);
Yes you're close, but you cannot use fetch(sql), because that eagerly executes the query and it will be too late to append LIMIT and OFFSET. I generally don't recommend the approach offered by Sergei Petunin, because that way, you will tell the RDBMS less information about what you're going to do. The execution plan and resource allocations are likely going to be better if you actually use LIMIT and OFFSET.
There are two ways to do what you want to achieve:
Use the parser
You can use DSLContext.parser() to parse your SQL query and then modify the resulting SelectQuery, or create a derived table from that. Creating a derived table is probably a bit cleaner:
dsl.selectFrom(dsl.parser().parse("select name from employee"))
.limit(10)
.offset(10)
.fetch();
The drawback is that the parser will have to understand your SQL string. Some vendor specific features will no longer be available.
The advantage (starting from jOOQ 3.13) is that you will be able to provide your generated code with attached converters and data type bindings this way, as jOOQ will "know" what the columns are.
Use plain SQL
You were already using plain SQL, but the wrong way. Instead of fetching the data eagerly, just wrap your query in DSL.table() and then use the same approach as above.
When using plain SQL, you will have to make sure manually, that the resulting SQL is syntactically correct. This includes wrapping your query in parentheses, and possibly aliasing it, depending on the dialect you're using:
dsl.selectFrom(table("(select name from employee)").as("t"))
.limit(10)
.offset(10)
.fetch();
The best thing you can do with a string query is to create a ResultQuery from it. It allows you to limit the maximum amount of rows fetched by the underlying java.sql.Statement:
create.resultQuery("select name from employee").maxRows(10).fetch();
or to fetch lazily and then scroll through the cursor:
create.resultQuery("select name from employee").fetchLazy().fetch(10);
Adding an offset or a limit to a query is only possible using a SelectQuery, but I don't think there's any way to transform a string query to a SelectQuery in JOOQ.
Actually, if you store SQL queries as strings in the database, then you are already in a non-typesafe area, and might as well append OFFSET x LIMIT y directly to a string-based query. Depending on the complexity of your queries, it might work.

Better to query once, then organize objects based on returned column value, or query twice with different conditions?

I have a table which I need to query, then organize the returned objects into two different lists based on a column value. I can either query the table once, retrieving the column by which I would differentiate the objects and arrange them by looping through the result set, or I can query twice with two different conditions and avoid the sorting process. Which method is generally better practice?
MY_TABLE
NAME AGE TYPE
John 25 A
Sarah 30 B
Rick 22 A
Susan 43 B
Either SELECT * FROM MY_TABLE, then sort in code based on returned types, or
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'A' followed by
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'B'
Logically, a DB query from a Java code will be more expensive than a loop within the code because querying the DB involves several steps such as connecting to DB, creating the SQL query, firing the query and getting the results back.
Besides, something can go wrong between firing the first and second query.
With an optimized single query and looping with the code, you can save a lot of time than firing two queries.
In your case, you can sort in the query itself if it helps:
SELECT * FROM MY_TABLE ORDER BY TYPE
In future if there are more types added to your table, you need not fire an additional query to retrieve it.
It is heavily dependant on the context. If each list is really huge, I would let the database to the hard part of the job with 2 queries. At the opposite, in a web application using a farm of application servers and a central database I would use one single query.
For the general use case, IMHO, I will save database resource because it is a current point of congestion and use only only query.
The only objective argument I can find is that the splitting of the list occurs in memory with a hyper simple algorithm and in a single JVM, where each query requires a bit of initialization and may involve disk access or loading of index pages.
In general, one query performs better.
Also, with issuing two queries you can potentially get inconsistent results (which may be fixed with higher transaction isolation level though ).
In any case I believe you still need to iterate through resultset (either directly or by using framework's methods that return collections).
From the database point of view, you optimally have exactly one statement that fetches exactly everything you need and nothing else. Therefore, your first option is better. But don't generalize that answer in way that makes you query more data than needed. It's a common mistake for beginners to select all rows from a table (no where clause) and do the filtering in code instead of letting the database do its job.
It also depends on your dataset volume, for instance if you have a large data set, doing a select * without any condition might take some time, but if you have an index on your 'TYPE' column, then adding a where clause will reduce the time taken to execute the query. If you are dealing with a small data set, then doing a select * followed with your logic in the java code is a better approach
There are four main bottlenecks involved in querying a database.
The query itself - how long the query takes to execute on the server depends on indexes, table sizes etc.
The data volume of the results - there could be hundreds of columns or huge fields and all this data must be serialised and transported across the network to your client.
The processing of the data - java must walk the query results gathering the data it wants.
Maintaining the query - it takes manpower to maintain queries, simple ones cost little but complex ones can be a nightmare.
By careful consideration it should be possible to work out a balance between all four of these factors - it is unlikely that you will get the right answer without doing so.
You can query by two conditions:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B'
This will do both for you at once, and if you want them sorted, you could do the same, but just add an order by keyword:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B' ORDER BY TYPE ASC
This will sort the results by type, in ascending order.
EDIT:
I didn't notice that originally you wanted two different lists. In that case, you could just do this query, and then find the index where the type changes from 'A' to 'B' and copy the data into two arrays.

Efficient way to check if record (from a large set of data) is existing in the Database (JPA/Hibernate)

We have a large set of data (bulk data) that needs to be checked if the record is existing in the database.
We are using SQL Server2012/JPA/Hibernate/Spring.
What would be an efficient or recommended way to check if a record exists in the database?
Our entity ProductCodes has the following fields:
private Integer productCodeId // this is the PK
private Integer refCode1 // ref code 1-5 has a unique constraint
private Integer refCode2
private Integer refCode3
private Integer refCode4
private Integer refCode5
... other fields
The service that we are creating will be given a file where each line is a combination of refCode1-5.
The task of the service is to check and report all lines in the file that are already existing in the database.
We are looking at approaching this in two ways.
Approach1: Usual approach.
Loop through each line and call the DAO to query the refCode1-5 if existing in the db.
//psuedo code
for each line in the file
call dao. pass the refCode1-5 to query
(select * from ProductCodes where refCode1=? and refCode2=? and refCode3=? and refCode4=? and refCode5=?
given a large list of lines to check, this might be inefficient since we will be invoking the DAO xxxx number of times. If the file say consists of 1000 lines to check, this will be 1000 connections to the DB
Approach2: Query all records in the DB approach
We will query all records in the DB
Create a hash map with concatenated refCode1-5 as keys
Loop though each line in the file validating against the hashmap
We think this is more efficient in terms of DB connection since it will not create 1000 connections to the DB. However, if the DB table has for example 5000 records, then hibernate/jpa will create 5000 entities in memory and probably crash the application
We are thinking of going for the first approach since refCode1-5 has a unique constraint and will benefit from the implicit index.
But is there a better way of approaching this problem aside from the first approach?
try something like a batch select statement for say 100 refCodes instead of doing a single select for each refCode.
construct a query like
select <what ever you want> from <table> where ref_code in (.....)
Construct the select projection in a way that not just gives you wnat you want but also the details of ref_code. Teh in code you can do a count or multi-threaded scan of resultset if DB said you got less refCodes that the number you codes you entered in query.
You can try to use the concat operator.
select <your cols> from <your table> where concat(refCode1, refCode2, refCode3, refCode4, refCode5) IN (<set of concatenation from your file>);
I think this will be quite efficient and it may be worth to try to see if pre-sorting the lines and playing with the num of concatenation taken each times bring you some benefits.
I would suggest you create a temp table in your application where all records from file are stored initially with batch save, and later you run a query joining new temp table and productCodes table to achieve filtering how you like. In this way you are not locking productCodes table many times to check individual row as SqlServer locks rows on select statement as well.

Categories