I have the following schema (SQLite) for capturing messages from messengers:
create table if not exists spot_message
(
id unsigned big int primary key not null,
messenger_id text not null,
messenger_name text not null,
message_type text not null,
timestamp text not null,
latitude real not null,
longitude real not null
);
I use the following self-join in order to find the latest message from every sender:
select t1.*
from spot_message t1
join (select messenger_id, max(timestamp) timestamp
from spot_message
group by messenger_id) t2 on t1.messenger_id = t2.messenger_id and t1.timestamp = t2.timestamp;
It is not clear to me how to represent this in jOOQ.
I currently have:
DSL.using(c.get())
.select(asterisk())
.from(table("spot_message").as("t1"))
.join(select(field("messenger_id"), max(field("timestamp"))).from(table("spot_message"))
.groupBy(field("messenger_id")))
.on(field("messenger_id")
.eq(field("messenger_id"))
.and(field("timestamp")
.eq(field("timestamp"))));
But it is not clear how I express the "as" for the table name of the joined table ("t2").
Answering your question
To alias a Select as a derived table, you can either use:
Select.asTable(alias)
DSL.table(select).as(alias)
You're using the plain SQL API for query building, so you can just use any SQL expression in your strings, such as e.g. "t2.messenger_id". This should work:
DSL.using(c.get())
.select(asterisk())
.from(table("spot_message").as("t1"))
.join(select(field("messenger_id"), max(field("timestamp")))
.from(table("spot_message"))
.groupBy(field("messenger_id")).asTable("t2")) // t2 alias here
.on(field("t1.messenger_id") // Qualifications here, and below
.eq(field("t2.messenger_id"))
.and(field("t1.timestamp")
.eq(field("t2.timestamp"))));
However, this would be a bit more readable if you were using code generation, which I recommend for various reasons
SpotMessage t1 = SPOT_MESSAGE.as("t1");
SpotMessage t2 = SPOT_MESSAGE.as("t2");
DSL.using(c.get())
.select(t1.asterisk())
.from(t1)
.join(
select(t2.MESSENGER_ID, max(t2.TIMESTAMP).as(t2.TIMESTAMP))
.from(t2)
.groupBy(t2.MESSENGER_ID).asTable(t2))
.on(t1.MESSENGER_ID.eq(t2.MESSENGER_ID))
.and(t1.TIMESTAMP.eq(t2.TIMESTAMP));
Alternative using window functions
Since you're using SQLite, which supports window functions, why not just use those? You can even use the QUALIFY syntax, which jOOQ can emulate for you if you're using the commercial distributions:
In SQL:
select *
from spot_message
qualify timestamp = max(timestamp) over (partition by messenger_id)
In jOOQ:
ctx.selectFrom(SPOT_MESSAGE)
.qualify(SPOT_MESSAGE.TIMESTAMP.eq(
max(SPOT_MESSAGE.TIMESTAMP).over(partitionBy(SPOT_MESSAGE.MESSENGER_ID))
));
If QUALIFY isn't available to you, you can still emulate it manually by wrapping the query in a derived table:
SELECT *
FROM (
SELECT
spot_message.*,
timestamp = max(timestamp) over (partition by messenger_id) AS is_max
FROM spot_message
) AS t
WHERE is_max
I have an eCommerce app. I have an Item entity and whenever that item's end date time is equal to current time, the Item's status should change ( I also need to execute other SQL operations such as inserting a row to a table)
Basically, I want to execute an SQL operation that checks the database and changes entities every minute.
I have a few ideas on how to implement this:
Schedule a job in my linux server that checks the db every minute
Use sp_executesql (Transact-SQL) or DBMS Scheduler
Have a thread running in my Java backend to check db and execute operations.
I am very new to this, so I don't have any idea how to implement this. What is the most efficient implementation that takes into account scalability performance?
Other information: database is SQL Server, server is Linux, backend is Java Spring Boot.
If you need to run a script after an insert or update, you can consolidate all that complex logic (e.g. insert rows in other tables, update the status column, etc.) in a trigger:
Here's a sample table schema:
CREATE TABLE t1 (id INT IDENTITY(1,1), start_time DATETIME, end_time DATETIME,
status VARCHAR(25))
And a sample insert/update trigger for that table:
CREATE TRIGGER u_t1
ON t1
AFTER INSERT,UPDATE
AS
BEGIN
UPDATE t1
SET status = CASE WHEN inserted.end_time = inserted.start_time
THEN 'same' ELSE 'different' END
FROM t1
INNER JOIN inserted ON t1.id = inserted.id
-- do anything else you want!
-- e.g.
-- INSERT INTO t2 (id, status) SELECT id, status FROM inserted
END
GO
Insert a couple test records:
INSERT INTO t1 (start_time, end_time)
VALUES
(GETDATE(), GETDATE() - 1), -- different
(GETDATE(), GETDATE()) -- same
Query the table after the inserts:
SELECT * FROM t1
See that the status is calculated correctly:
id start_time end_time status
1 2018-07-17 02:53:24.577 2018-07-16 02:53:24.577 different
2 2018-07-17 02:53:24.577 2018-07-17 02:53:24.577 same
If your only goal is to update the status column based on other values in the table, then a computed column is the simplest approach; you just supply the formula:
create table t1 (id int identity(1,1), start_time datetime, end_time datetime,
status as
case
when start_time is null then 'start null'
when end_time is null then 'end null'
when start_time < end_time then 'start less'
when end_time < start_time then 'end less'
when start_time = end_time then 'same'
else 'what?'
end
)
story
I have two codes which perform the same task with the different way.
number_of_factors =10
number_of_formula_portions=5
number_of_travellers=15
cycles=2
call DB multiple times read data =5*15*2 times DB calls. and use ehcache.
call DB only 1 time (one query contains 10*5*15*2 of sql union operations in one query)
logically 2nd one should get more performance because of only 1 DB call, time-saving.
But practically 2nd one takes more time to evaluate query.
I have a dynamically generated union query. It has 10*5*15*2 (number_of_factors number_of_formula_portionsnumber_of_travellers*number_of_cycles) union statements. When I run it DB is taking too much time. But when I run it for one traveler via the application, It is fine. I thought logically reading all data at once has a lot of performance, But DB is getting stuck.
UNIT QUERY
select ? as FACTORNAME,
WEIGHTING,
? as KEYCYCLE,
? as KEYTRAVELLER,
? as KEYSUBFORMULA
from (
(SELECT *
FROM (
(SELECT ID,
ELEMENT_LOGIC_ID,
FACTOR_VALUE1,
FACTOR_VALUE2,
FACTOR_VALUE3,
FACTOR_VALUE4,
FACTOR_VALUE5,
FACTOR_VALUE6,
FACTOR_VALUE7,
WEIGHTING,
START_DATE,
END_DATE
FROM ABC_PRICE_FACTOR
WHERE ELEMENT_LOGIC_ID =?
AND START_DATE <= ?
AND END_DATE >= ?
AND FACTOR_VALUE1=?
ORDER BY ID DESC )
)
WHERE ROWNUM <= 1)
)
PARAMETERS
F577(String), 0(String), 0(String), 1(String), 577(Long), 2018-06-28 00:00:00.0(Timestamp), 2018-06-28 00:00:00.0(Timestamp), 1(String),
SAMPLE UNION QUERY
select * from (
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1>? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1<? AND FACTOR_VALUE2=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
...
)
note:
dynamically bellow part is different in the query. It is depending on factor match type [equal, lower bound, upper bound]. there are 7 factors. FACTOR_VALUE1,FACTOR_VALUE2.... like wise. So I am not going to show you actual SQL here. it has 1.8 MB query.
equal
FACTOR_VALUE1=?
or lower bound
FACTOR_VALUE1<?
or upper bound
FACTOR_VALUE1>?
business logic behind the scene
sorry guys for not providing actual and provide sample query. I am expecting a comment on my approach.
It's like we have data of exam result.
there are 10 subjects in school.
there are 15 students.
there are 2 exam term tests.
those are in DB.
this data can be read in 2 ways.
read all data at once, and filter in application level.[large union query]
read one student's one term results at one by one.[small query]
all ideas are welcome.
" I thought logically reading all data at once has a lot of performance, But DB is getting stuck."
Up to a point. One database call will likely be more efficient in terms of network traffic. But the actual call you make executes lots of queries and glues them together with UNION: so there is no performance gain to be had if the main problem is the performance of the individual queries.
One obvious change you can make: use UNION ALL rather than UNION if the subqueries are exclusive, and save yourself some unnecessary sorts.
Beyond that, the logic of the subqueries looks suspect: you're hitting the same subset of rows each time, so you should consider using subquery factoring:
with cte as (
SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE
FROM ABC_PRICE_FACTOR
WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ?
)
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from (
select weighting from (
select weighting
from cte
where FACTOR_VALUE1=?
order by id desc )
where rownum <= 1
union all
select weighting from (
select weighting
from cte
where FACTOR_VALUE1>?
order by id desc )
where rownum <= 1
union all
select weighting from (
select weighting
from cte
where FACTOR_VALUE1<? AND FACTOR_VALUE2=?
order by id desc )
where rownum <= 1
...
)
Warning: tuning without understanding of the data volumes and distribution (skew), data structures or business rules - i.e. what you're asking us to do - is a mug's game. We're just guessing here, and the best you can hope for is that one of those guesses is lucky.
I think such a query can be optimized with quite a dramatic speed improvement. To achieve that, one must understand the logic behind it, though. On Stackoverflow, this is best done by providing a minimal example and some code.
Idea 1) - START_DATE, END_DATE
You've shown us only ?, so we don't know if the ? for the dates of all subqueries are the same. If so, you could filter down the table once in an initial step, without repeating the filtering 1500 times:
WITH datefiltered AS (SELECT * FROM ABC WHERE start_date <= ? AND end_date >= ?)
SELECT ... FROM datefiltered;
Idea 2) - UNION
Your pattern of UNION a lot of subqueries SELECT * FROM (SELECT ... ORDER BY ...) WHERE rownum <=1 is unusual. That is not a bad thing in itself, but it is likely that the database engine is not optimized for unusual queries.
You are using ORDER BY ID DESC)) WHERE ROWNUM <= 1, that means you are searching for the newest(?) row in a category.
The traditional pattern is to find a column (or more, or even an expression) and partition the query by it:
SELECT id, col1, col2
FROM (
SELECT id, col1, col2,
ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank
FROM ABC
)
WHERE my_rank <= 1;
In your case, the category is likely much more complex, but you can put that in a big CASE statement that groups your data into your subqueries:
CASE WHEN factor1=xx AND factor2>yy THEN 'mycat1'
WHEN factor3>zz AND factor2<yy THEN 'mycat2'
etc
END;
To put all three together would look like
SELECT id, col1, col2
FROM (
SELECT id, col1, col2,
ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank
FROM (
SELECT id, col1, col2,
CASE WHEN factor...
END as mycategory
FROM ABC
WHERE start_date <= xx AND end_date >= yy
)
)
WHERE my_rank <= 1;
I am trying to get data for all dates in a range provided by my query, but I'm only getting the dates that actually exist in my table - missing dates are not reported. I need to create records in the table for those missing dates, with other columns left null, and then include them in the results.
My table table_name has records like:
ID Name Date_only
---- ---- -----------
1234 xyz 01-Jan-2014
1234 xyz 02-Jan-2014
1234 xyz 04-Jan-2014
...
For example, for the range 01-Jan-2014 to 04-Jan-2014, my query is:
select * from table_name
where id=1234
and (date_only >= '01-Jan-14' and date_only <= '04-Jan-14')
From Java or queried directly this shows three rows, with no data for 03-Jan-2014.
I need a single statement to insert rows for any missing dates into the table and return the data for all four rows. How can I do that?
UPDATE
Followed query worked for only if only 1 record available in table OR search range 2-5 days,
SELECT LEVEL, to_date('2014-11-08','yyyy-mm-dd') + level as day_as_date FROM DUAL CONNECT BY LEVEL <= 10 .
UPDATE WITH FIDDLE EXAMPLE
I got Error is:
I have table data and same query executed then i got error is ORA-02393: exceeded call limit on CPU usage, fiddle example is : my owntable sqlfiddle example .thanks in advance
you can use the below SQL for your purpose.The sql fiddle here http://sqlfiddle.com/#!4/3ee61/27
with start_and_end_dates as (select min(onlydate) min_date
,max(onlydate) max_date
from mytable
where id='1001'
and onlydate >= to_date('01-Jan-2015','dd-Mon-YYYY')
and onlydate <= to_date('04-Jan-2015','dd-Mon-YYYY')),
missing_dates as (select min_date + level-1 as date_value
from start_and_end_dates connect by level <=(max_date - min_date) + 1)
select distinct id,name,date_value
from mytable,missing_dates
where id='1001'
order by date_value
EDIT1:- Using your other example.The sqlfiddle is http://sqlfiddle.com/#!4/4c727/16
with start_and_end_dates as (select min(onlydate) min_date
,max(onlydate) max_date
from mytable
where name='ABCD'),
missing_dates as (select min_date + level-1 as date_value
from start_and_end_dates connect by level <=(max_date - min_date) + 1)
select distinct id,name,date_value
from mytable,missing_dates
where name='ABCD'
order by date_value;
You can use a query like
SELECT LEVEL, to_date('2014-01-01','yyyy-mm-dd') + level as day_as_date
FROM DUAL
CONNECT BY LEVEL <= 1000
to get a list of 1000 days from Jan 1 2014 (adjust to your need)
Next do an insert from select
INSERT INTO table_name (date_only)
SELECT day_as_date FROM (<<THE_QUERY_ABOVE>>)
WHERE day_as_date NOT IN (SELECT date_only FROM table_name)
I found a solution to my query in this site. I need to get the count(*) value from multiple tables Select count(*) from multiple tables
My other problem is to get the DATE values of those multiple tables that returns > 1 row count. I have to compare the dates and get the latest. I have to query from 12 tables. Assuming I got > 1 value for tables 1, 2, 3, 4, 5, I will need to compare their DATE values.
Sample code:
SELECT(SELECT COUNT(*) FROM table1) AS count1,
(SELECT COUNT(*) FROM table2) AS count2,
(SELECT COUNT(*) FROM table3) AS count3 FROM dual
count1 count2 count3
3 2 2
I need to select the MOST RECENT date in these tables.
Select date from table1 order by date desc
Select date from table2 order by date desc
Select date from table3 order by date desc
How am I supposed to do that without table scanning?
EDIT:
Okay. The instructions said
"Get matching records from tables 1-12 using the id as the key".
If there are no records, display "No record found"
Else: get the record with the latest date by comparing data_encoded from all tables.
It's pretty hard to tell what you're after, but here's a guess:
SELECT
(SELECT MAX(date) FROM table1) AS latest1,
(SELECT MAX(date) FROM table2) AS latest2,
(SELECT MAX(date) FROM table3) AS latest3,
-- etc
FROM dual;
You can avoid table scans by having indexes on date, in which case the optimizer should do index-only scans (very fast).
have you try to group them and order it and select the first date
select ID, Date, count(date)
from table1 t1
inner join table2 t2 where t1.id = t2.id
inner join table3 t3 where t1.id = t3.id
etc..
group by date
order by date desc
something long that line