I am trying to get the latest record from the Database (Derby database).
I have a BILL table in the database that has a column BillId. The data type of BillId is varchar(15) and is in the format as:
3122022-1
The digits before the "-" (i.e., 3122022) are according to the date (3/12/2002). The value after the "-" is the bill counter (i.e., 1).
The problem is, when I try to get the latest record from the database using max(BILLID), it considers 3122022-9 as the maximum/latest record even if the billId 3122022-10 or higher exists.
In simple words, it ignores the 0 or any value placed at the second place after "-". Why is this issue happening and what is the solution??
Here is the table structure:
Bill table
I used the following query:
select max(billId) as lastBill from Bill where empName='Hassan' and Date=Current Date;
empName is important as there are 4-5 employees and each will have their own count of Bill.
If I run this query:
select billid from bill order by empName desc;
I get this result:
Bill ids when I sort them by empName column
But if I run the max(billId) query, This is what I get:
select max(billId) as lastBill from Bill where empName='Hassan' and Date=Current Date;
max(billid) results
I hope I was able to explain my question well. Will be grateful for your help and support.
I tried max(billId)
i came up with sample dataset and query.
//Postgres sql
with data as
(
select 'A' as emp_name,'03122022-1' as dated_on
union
select 'A' as emp_name,'03122022-2' as dated_on
union
select 'A' as emp_name,'03122022-3' as dated_on
union
select 'A' as emp_name,'03122022-4' as dated_on
union
select 'A' as emp_name,'03122022-5' as dated_on
union
select 'A' as emp_name,'03122022-6' as dated_on
)
,
data_clean as (
select emp_name,dated_on,
to_date((regexp_split_to_array (dated_on,'-'))[1],'DDMMYYYY') as bill_dated_on,
(regexp_split_to_array (dated_on,'-'))[2] ::int as bill_id
from data)
select emp_name,max(bill_id) from data_clean
where bill_dated_on='20221203'
group by emp_name;
emp_name|max|
--------+---+
A | 6|
story
I have two codes which perform the same task with the different way.
number_of_factors =10
number_of_formula_portions=5
number_of_travellers=15
cycles=2
call DB multiple times read data =5*15*2 times DB calls. and use ehcache.
call DB only 1 time (one query contains 10*5*15*2 of sql union operations in one query)
logically 2nd one should get more performance because of only 1 DB call, time-saving.
But practically 2nd one takes more time to evaluate query.
I have a dynamically generated union query. It has 10*5*15*2 (number_of_factors number_of_formula_portionsnumber_of_travellers*number_of_cycles) union statements. When I run it DB is taking too much time. But when I run it for one traveler via the application, It is fine. I thought logically reading all data at once has a lot of performance, But DB is getting stuck.
UNIT QUERY
select ? as FACTORNAME,
WEIGHTING,
? as KEYCYCLE,
? as KEYTRAVELLER,
? as KEYSUBFORMULA
from (
(SELECT *
FROM (
(SELECT ID,
ELEMENT_LOGIC_ID,
FACTOR_VALUE1,
FACTOR_VALUE2,
FACTOR_VALUE3,
FACTOR_VALUE4,
FACTOR_VALUE5,
FACTOR_VALUE6,
FACTOR_VALUE7,
WEIGHTING,
START_DATE,
END_DATE
FROM ABC_PRICE_FACTOR
WHERE ELEMENT_LOGIC_ID =?
AND START_DATE <= ?
AND END_DATE >= ?
AND FACTOR_VALUE1=?
ORDER BY ID DESC )
)
WHERE ROWNUM <= 1)
)
PARAMETERS
F577(String), 0(String), 0(String), 1(String), 577(Long), 2018-06-28 00:00:00.0(Timestamp), 2018-06-28 00:00:00.0(Timestamp), 1(String),
SAMPLE UNION QUERY
select * from (
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1>? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1<? AND FACTOR_VALUE2=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
...
)
note:
dynamically bellow part is different in the query. It is depending on factor match type [equal, lower bound, upper bound]. there are 7 factors. FACTOR_VALUE1,FACTOR_VALUE2.... like wise. So I am not going to show you actual SQL here. it has 1.8 MB query.
equal
FACTOR_VALUE1=?
or lower bound
FACTOR_VALUE1<?
or upper bound
FACTOR_VALUE1>?
business logic behind the scene
sorry guys for not providing actual and provide sample query. I am expecting a comment on my approach.
It's like we have data of exam result.
there are 10 subjects in school.
there are 15 students.
there are 2 exam term tests.
those are in DB.
this data can be read in 2 ways.
read all data at once, and filter in application level.[large union query]
read one student's one term results at one by one.[small query]
all ideas are welcome.
" I thought logically reading all data at once has a lot of performance, But DB is getting stuck."
Up to a point. One database call will likely be more efficient in terms of network traffic. But the actual call you make executes lots of queries and glues them together with UNION: so there is no performance gain to be had if the main problem is the performance of the individual queries.
One obvious change you can make: use UNION ALL rather than UNION if the subqueries are exclusive, and save yourself some unnecessary sorts.
Beyond that, the logic of the subqueries looks suspect: you're hitting the same subset of rows each time, so you should consider using subquery factoring:
with cte as (
SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE
FROM ABC_PRICE_FACTOR
WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ?
)
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from (
select weighting from (
select weighting
from cte
where FACTOR_VALUE1=?
order by id desc )
where rownum <= 1
union all
select weighting from (
select weighting
from cte
where FACTOR_VALUE1>?
order by id desc )
where rownum <= 1
union all
select weighting from (
select weighting
from cte
where FACTOR_VALUE1<? AND FACTOR_VALUE2=?
order by id desc )
where rownum <= 1
...
)
Warning: tuning without understanding of the data volumes and distribution (skew), data structures or business rules - i.e. what you're asking us to do - is a mug's game. We're just guessing here, and the best you can hope for is that one of those guesses is lucky.
I think such a query can be optimized with quite a dramatic speed improvement. To achieve that, one must understand the logic behind it, though. On Stackoverflow, this is best done by providing a minimal example and some code.
Idea 1) - START_DATE, END_DATE
You've shown us only ?, so we don't know if the ? for the dates of all subqueries are the same. If so, you could filter down the table once in an initial step, without repeating the filtering 1500 times:
WITH datefiltered AS (SELECT * FROM ABC WHERE start_date <= ? AND end_date >= ?)
SELECT ... FROM datefiltered;
Idea 2) - UNION
Your pattern of UNION a lot of subqueries SELECT * FROM (SELECT ... ORDER BY ...) WHERE rownum <=1 is unusual. That is not a bad thing in itself, but it is likely that the database engine is not optimized for unusual queries.
You are using ORDER BY ID DESC)) WHERE ROWNUM <= 1, that means you are searching for the newest(?) row in a category.
The traditional pattern is to find a column (or more, or even an expression) and partition the query by it:
SELECT id, col1, col2
FROM (
SELECT id, col1, col2,
ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank
FROM ABC
)
WHERE my_rank <= 1;
In your case, the category is likely much more complex, but you can put that in a big CASE statement that groups your data into your subqueries:
CASE WHEN factor1=xx AND factor2>yy THEN 'mycat1'
WHEN factor3>zz AND factor2<yy THEN 'mycat2'
etc
END;
To put all three together would look like
SELECT id, col1, col2
FROM (
SELECT id, col1, col2,
ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank
FROM (
SELECT id, col1, col2,
CASE WHEN factor...
END as mycategory
FROM ABC
WHERE start_date <= xx AND end_date >= yy
)
)
WHERE my_rank <= 1;
I have three tables:
NEWS
news_id | other fields..
TAGS
tag_id | other fields..
and connecting table TAGS_NEWS
tag_id | news_id
I want to select data from NEWS table by tags. But the problem is that I should select data by many tags. I can create only one solution of this problem, at first select data by one tag, than from selected data by another tag and so on. But I think it isn't good way of solution of this problem. Maybe is the best way to solve this problem? Maybe I can select necessary data in one query?
For example, NEWS_TAG table:
tag_id | news_id
1 1
3 12
4 11
1 10
6 1
7 2
8 3
9 3
10 3
Select data by tags 1,6
Get the result: news_id = 1,
or Select data by tags 8,9
Get the result: news_id = 3
I think I got your problem and I had to solve a similar problem.
Try this solution:
select *
from news
join news_tags on news_tags.news_id = news.id
where news_tags.tags_id in(1,2)
group by news.id
having count(*) = 2
The only things you have to transmit to this query are the set of tags_id's (in my example 1,2) and the number of tags_id's (in my example 2).
I've tested it with SQL Fiddle.
try this solution
select a.* from NEWS a ,TAGS_NEWS b,TAGS c
where a.news_id = b.news_id
and b.tag_id = c.tag_id
and c.tag_id in ('id0001','id0002')
you need to pass only tags ids which is you need.
select * from news where new_id in
(select news_id from tagnews where tag_id in
(select tag_id from tag where tag_name in (name1,name2)) )
maybe it can help.
Please try this.
select news_id from news n join tags_news tn on n.news_id=tn.news_id where tn.tag_id in(1,3,6)
You can generalize sonnywhite solution like the below
with tag_ids as ( select tag_id from tags where tag_id in (1,6)),
tag_id_cnt as ( select count(1) cnt from tag_ids)
select * from news,tag_id_cnt where (news_Id,cnt) in
(select news_Id, count(1) from news_tags a, tag_ids b
Where a.tag_id=b.tag_Id
Group by news_Id)
I have a java program that returns a list of Long values (hundreds).
I would like to subtract to this list the values obtained from a select on an oracle database,
something like this:
SELECT 23 as num FROM DUAL UNION ALL
SELECT 17 as num FROM DUAL UNION ALL
SELECT 19 as num FROM DUAL UNION ALL
SELECT 67 as num FROM DUAL UNION ALL...
...
...
SELECT 68 as num FROM DUAL MINUS
SELECT NUM FROM MYTABLE
I presume that this operation has some performance issues...
Are there other better approaches?
Thank you
Case 1:
Use Global Temporary Tables (GTT):
CREATE GLOBAL TEMPORARY TABLE my_temp_table (
column1 NUMBER
) ON COMMIT DELETE ROWS;
Insert the List (Long value) into my_temp_table:
INSERT ALL
INTO my_temp_table (column1) VALUES (27)
INTO my_temp_table (column1) VALUES (32)
INTO my_temp_table (column1) VALUES (25)
.
.
.
SELECT 1 FROM DUAL;
Then:
SELECT * FROM my_temp_table
WHERE column1 NOT IN (SELECT NUM FROM MYTABLE);
Let me know if you have any issue.
Case 2:
Use TYPE table:
CREATE TYPE number_tab IS TABLE OF number;
SELECT column_value AS num
FROM TABLE (number_tab(1,2,3,4,5,6)) temp_table
WHERE NOT EXIST (SELECT 1 FROM MYTABLE WHERE MYTABLE.NUM = temp_table.num);
Assuming MyTable is much bigger than literal values, I think the best option is using a temporary table to store your values. This way your query is a lot cleaner.
If you are working in a concurrent environment (e.g. typical web app), use an id field, and delete when finished. Summarizing:
preliminary: create a table for temporary values TEMPTABLE(id, value)
for each transaction
get new unique/atomic id (new oracle sequence value, for example)
for each literal value: insert into temptable(new_id, value)
select * from temptable where id = new_id minus...
process result
delete from temp_table where id = new_id
Temporary tables are a good solution in oracle. This one can be used with an ORM persistence layer
I have an employee table like:
Empid EmpName Remark
001 Bob
002 Harish
003 Tom
004 Dicky
001 Bob
003 Tom
I have to find the duplicate employee id and accordingly updating the remark field as duplicate !
Thanks.
update employee set remark = 'duplicate'
where empid in (
select empid
from employee
group by empid, empname
having count(*) > 1 )
This question is very vague, because you do not mention what ORM library you are using or how you are accessing/manipulating your database. But basically want you want to do is execute a derived table query, then make a decision based on the results.
SELECT * FROM
(SELECT empId, count(empId) numIds from Employee group by empId) IdCount
WHERE numIds > 1;
Run this query via a PreparedStatement or whatever your ORM framework provides, then iterate over each result and update your remark field.
Below will give you ID of duplicate records
`select empID, empName, count(empID) as cnt from fschema.myTable group by (empID) having cnt > 1 order by cnt`
Will get back to you how to set remark as 1 for duplicates shortly...