ROW_NUMBER() not sequencing records correctly

ROW_NUMBER() not sequencing records correctly - java

I had to copy the data from oracle tables to files.
I have a join query which fetches 800k records so i used row_number() function along with order by clause to generate 4 files containing 200k each.
Query :
SELECT * FROM (
SELECT ROW_NUMBER() OVER ( order by FILE_KEY desc ) rn,
FILE_KEY, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO
FROM BANK_INTL_ROUT_TBL rout, BANK_INTL_LOC_TBL loc
WHERE loc.CATALOG_NO = rout.FILE_KEY)
WHERE rn BETWEEN start AND end;
Parameters:
For 1st File : start =1 ,end = 200000
For 2nd File : start =200001 ,end = 400000
For 3rd File : start =400001 ,end = 600000
For 4th File : start =600001 ,end = 800000
But when i checked last 10 row using this query in sql query browser and last 10 rows of file are different that is sequence is different in file and sql query browser.
SELECT * FROM (
SELECT ROW_NUMBER() OVER( order by FILE_KEY desc ) rn,
FILE_KEY,ROUTING_NO,INTLROUT_TYPE,ABBR_COUNTRY_CODE_2D,HO_CATALOG_NO
FROM BANK_INTL_ROUT_TBL rout, BANK_INTL_LOC_TBL loc
WHERE loc.CATALOG_NO=rout.FILE_KEY)
WHERE rn BETWEEN 709990 AND 80000;

This can be because you have something like this
row_number file_key
799998 same_number
799999 same_number
800000 same_number
800001 same_number
800002 same_number
800003 same_number
800004 same_number
because you order by file_key.
How do you observed that are different data? from your other columns. So, you can use:
SELECT ROW_NUMBER() OVER(order by FILE_KEY desc, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO ) rn
Or(second cause), your base table had been changen between your querys.
UDPDATE: you can use the use_hash hint to speed up your query. 5 hours is too much for this query.
SELECT * FROM (
SELECT /*+use_hash(rout loc)*/
ROW_NUMBER() OVER(order by FILE_KEY desc, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO ) rn,
FILE_KEY, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO
FROM BANK_INTL_ROUT_TBL rout, BANK_INTL_LOC_TBL loc
WHERE loc.CATALOG_NO = rout.FILE_KEY)
WHERE rn BETWEEN start AND end;

In the over clause, order by a unique field in BANK_INTL_LOC_TBL:
SELECT * FROM (
SELECT ROW_NUMBER() OVER ( order by loc.**LOC_KEY** desc ) rn,
FILE_KEY, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO
FROM BANK_INTL_ROUT_TBL rout, BANK_INTL_LOC_TBL loc
WHERE loc.CATALOG_NO = rout.FILE_KEY)
WHERE rn BETWEEN start AND end
ORDER BY rn;
UPDATE: according to #Shannon Severance comment
add the order by clause

If you have disk to spare on your Oracle installation (which you should!), then instead of running the inner query 4 times it may end up being faster to do the following
CREATE TABLE bank_data
NOLOGGING
PARALLEL 4
AS SELECT ROW_NUMBER() OVER ( order by FILE_KEY desc ) rn,
FILE_KEY, ROUTING_NO, INTLROUT_TYPE, ABBR_COUNTRY_CODE_2D, HO_CATALOG_NO
FROM BANK_INTL_ROUT_TBL rout, BANK_INTL_LOC_TBL loc
WHERE loc.CATALOG_NO = rout.FILE_KEY);
The amount of parallelism to use (the number 4 in my example here) will depend on how much concurrent work your database can handle, mostly dependent on the number of CPUs.
After that has finished, (which should take noticably less than 5 hours!) you can then run simple selects on the bank_dump table to pull the records you desire
SELECT *
FROM bank_dump
where rn < 200000
for your first data set, for example.

Related

Postgres query to find multiple records with a particular repeat count within a table

I have 2 tables Customer and Orders.
1st question:
That is a master table for Customers that have a few columns like Customer number, customer name, Active flag, etc. Table may contain 2 or more records for the same customer number but as per the business logic only 1 records at a time should ideally be ACTIVE. I need to find customers that have only 1 record and it should be active.
query that I have written:
select customer_number, count(*)
from customers c
where active = false
group by customer_number
having count(*) = 1;
This returns me the customers that have 2 records and only 1 is NOT ACTIVE.
Question 2:
Apart from customer table we have another table that is Orders table, it contains columns like Customer number(as same in Customers table), deliver date, order number, insert time.
I need to find the customers whose ACTIVE is false, and have not given any orders since 180 days. (INSERT TIME::date - 180).
what I have tried is not giving me the desired output, as on back testing I found that the data is wrong
select om.customer_number,
c.customer_name,
om.deliverydate,
om.insert_time
from customers c, order_master om
where
om.customer_number in
(
select c2.customer_number
from customers c2
where c2.active = false
group by c2.customer_number
having count(*) =1
)
and c.customer_number = om.customer_number
group by om.customer_number, c.customer_name,
om.deliverydate, om.insert_time
having max(om.insert_time::date) < '2022-06-01' ;
The queries that I have tried, I have already mentioned them in my question. Please check that.

For the first question, find customers that have only 1 record and it should be active , you may use conditional aggregation or filtered count as the following:
select customer_number
from Customers c
group by customer_number
having count(*) = 1 and count(*) filter (where active) = 1;
For the second question, find the customers whose ACTIVE is false, and have not given any orders since 180 days, try the following:
select cu.customer_number
from order_master om join
(
select customer_number
from Customers c
group by customer_number
having count(*) filter (where active) = 0
) cu
on om.customer_number = cu.customer_number
group by cu.customer_number
having max(om.insert_time) < current_date - interval '180 day'
See a demo.
If you want to get all order details for the inactive customers, you may join the above query with the orders table as the following:
with inactive_cust as
(
select cu.customer_number, cu.customer_name
from order_master om join
(
select customer_number, customer_name
from Customers c
group by customer_number, customer_name
having count(*) filter (where active) = 0
) cu
on om.customer_number = cu.customer_number
group by cu.customer_number, cu.customer_name
having max(om.insert_time) < current_date - interval '180 day'
)
select c.customer_number, c.customer_name,
o.order_number, o.insert_time
from inactive_cust c join order_master o
on c.customer_number = o.customer_number
See a demo.

#Ahmed- Both of your queries worked fine.
However in the 2nd query I want to fetch additional data into it, so what I did was -
select om.customer_number, cu.customer_name, om.order_number ,om.insert_time
from order_master om join
(
select customer_number, customer_name
from Customers c
group by customer_number, customer_name
having count(*) filter (where active) = 0
) cu
on om.customer_number = cu.customer_number
group by om.customer_number , cu.customer_name, om.insert_time,om.order_number
having max(om.insert_time) < current_date - interval '180 day';
When I tried the query shared by you -
select om.customer_number
from order_master om join
(
select customer_number
from Customers c
group by customer_number
having count(*) filter (where active) = 0
) cu
on om.customer_number = cu.customer_number
group by om.customer_number
having max(om.insert_time) < current_date - interval '180 day';
Its giving me around 4K results, and when I am trying with my modifications, so after adding each column in the query the result count is increasing exponentially till 75K and more.
Also its showing me records for which max(om.insert_time) is much greater than 180 days

Oracle nested select with defined rows

My query looks like this:
SELECT
nvl(dd,'TOTAL') "Subject",
SUM(cnt) "Count,
SUM(pct) AS "%"
FROM
(
SELECT
dd,
COUNT(1) cnt,
round(RATIO_TO_REPORT(COUNT(1) ) OVER() * 100,2) AS pct
FROM
student p,
student_subject a
WHERE
p.sId = a.sId
AND student_type IN (
'1',
'2'
)
AND dd IN (
'MATH',
'SCIENCE',
'HISTORY'
)
GROUP BY
dd
ORDER BY
1
)
GROUP BY
ROLLUP(dd)
ORDER BY
1;
My Output should look like this:
Subject Count %
MATH 33 23.2%
SCIENCE 24 11.46%
HISTORY 56 44.778%
TOTAL 113 85.4.2%
If a particular subject doesnt have data it should still provide the row with 0 values like below
Subject Count %
MATH 33 23.20%
SCIENCE 0 0.00%
HISTORY 56 44.77%
TOTAL 113 85.42%
What I am getting rightnow is below with no SCIENCE row which is not desired ,
Subject Count %
MATH 33 23.20%
HISTORY 56 44.77%
TOTAL 113 85.42%
What I did is I removed the dd IN clause "AND dd IN (
'MATH',
'SCIENCE',
'HISTORY'
)"
However I am not able to get to the another inner select to select the 3 subjects.

If i understand the datamodel correctly when a student is not enrolled to a subject an entry for the subject wouldn't exist in student_subject table, which means the missing subject is not present in the deficit table as well. Hence technically it is not possible to join these two tables and report for a column value that doesn't exist in either of them.
Now to solve this,i use WITH clause to create another table to hold all the desired subjects and perform an outer join with the result set retrieved.
I have tested this and it works perfectly. Complete solution(Oracle 18c) with table and Query can be found in DBFIDDLE URL https://dbfiddle.uk/?rdbms=oracle_18&fiddle=df73453d7fa4e0478e74fa509b20a411.
WITH some_data AS (
SELECT 'MATH' AS subj
FROM dual
UNION ALL
SELECT 'SCIENCE' AS subj
FROM dual
UNION ALL
SELECT 'HISTORY' AS subj
FROM dual
)
SELECT
nvl(subj,'TOTAL') "Subject",
nvl(SUM(cnt),0) "Count",
nvl(SUM(pct),0) AS "%"
FROM
(SELECT
dd,
COUNT(1) cnt,
round(RATIO_TO_REPORT(COUNT(1) ) OVER() * 100,2) AS pct
FROM
student p,
student_subject a
WHERE
p.sId = a.sId
AND student_type IN (
'1',
'2'
)
AND dd IN (
'MATH',
'SCIENCE',
'HISTORY'
)
GROUP BY
dd
ORDER BY
1
) tab, some_data
where tab.dd(+) = some_data.subj
GROUP BY
ROLLUP(subj)
ORDER BY
1;

You need to use the list of tables as the inner view and use left join as follows:
SELECT NVL(DD, 'TOTAL') "Subject",
SUM(CNT) "Count",
SUM(PCT) AS "%"
FROM (
SELECT DD,
COUNT(1) CNT,
ROUND(RATIO_TO_REPORT(COUNT(1)) OVER() * 100, 2) AS PCT
FROM (
SELECT 'MATH' AS SUB FROM DUAL UNION ALL
SELECT 'SCIENCE' AS SUB FROM DUAL UNION ALL
SELECT 'HISTORY' AS SUB FROM DUAL
) SUBJECTS
LEFT JOIN STUDENT_SUBJECT A
ON SUBJECTS.SUB = A.DD
LEFT JOIN STUDENT P
ON P.SID = A.SID
WHERE STUDENT_TYPE IN (
'1','2'
)
GROUP BY DD
ORDER BY 1
)
GROUP BY ROLLUP(DD)
ORDER BY 1;

You have to use case statement to make it 0, If any of the subject is null then default it to 0. Let me know if you require a query. Am suggesting the logic so that you can try yourself.

ORA-00936, Java and SQL

I am aware that I have a missing expression, but I am unaware where it is?
this is the String I send to the parser:
#macro(intervals $startDate $endDate) SELECT #bind($startDate 'TIMESTAMP') - 1 + LEVEL interv, 1+EXTRACT(DAY FROM #bind($endDate 'TIMESTAMP')-#bind($startDate 'TIMESTAMP')) days_in_period FROM dual CONNECT BY LEVEL <= 1+EXTRACT(DAY FROM #bind($endDate 'TIMESTAMP')- #bind($startDate 'TIMESTAMP'))#end
#macro(tagSeries $dso $includeSetInfo) SELECT DISTINCT t.tag_group #if($includeSetInfo) , t.set_id tag_set_id, t.name tag_set_name #end , tsi.tag_id FROM TRANSPONDER_SET t, TRANSPONDER_SET_ID TSI WHERE t.set_id IN ( #if($TAG_SET_ID) #bind($TAG_SET_ID 'VARCHAR') #else #foreach($transponderSet in $dso.getTransponderSet() ) #if($velocityCount > 1) , #end #bind($transponderSet.getSetId() 'VARCHAR') #end #end ) AND t.set_id = tsi.set_id#end
#set( $data_opt = $data_selection)
WITH
intervals AS (#intervals($PERIOD_START_DATE $PERIOD_END_DATE)),
tags AS (#tagSeries($data_opt false)),
ag_regions AS ( SELECT /* materialize */
node_id, CONNECT_BY_ROOT ag2.name region_name
FROM area_group ag2
START WITH ag2.parent_node_id = 1
CONNECT BY NOCYCLE PRIOR ag2.node_id = ag2.parent_node_id),
stations AS (SELECT distinct ag.abbreviation,
ag.node_id,
rp.station_key,
ag.sorting
, CASE s.equipment_type_key WHEN 2 THEN s.station_name ELSE NULL END pk_number
,(SELECT region_name
FROM ag_regions
WHERE node_id = ag.node_id) region
FROM reg_point_view rp, area_group ag, station s
WHERE rp.area_group_node_id = ag.node_id
#if($station_region_options.isStations())
AND rp.station_key IN (#bind($station_region_options.getEntries() 'NUMERIC'))
#elseif($station_region_options.isPartitionNodes())
AND ag.node_id IN (#bind($station_region_options.getEntries() 'NUMERIC'))
#elseif($station_region_options.isTreeNodes())
AND ag.node_id IN (SELECT node_id
FROM area_group
START WITH NODE_ID IN (#bind($station_region_options.getEntries() 'NUMERIC'))
CONNECT BY PRIOR NODE_ID = PARENT_NODE_ID
AND NODE_ID <> PARENT_NODE_ID)
#end
AND (
rp.ant_ikraft_dato <= #bind($PERIOD_END_DATE 'TIMESTAMP') -- period end date
and (rp.ant_udlobs_dato is null
or rp.ant_udlobs_dato >= #bind($PERIOD_START_DATE 'TIMESTAMP') -- period start date
)
)
AND rp.station_key = s.station_key
),
pre_aggregation AS
(SELECT TRUNC(vd.reg_time) day, s.abbreviation, s.node_id, s.pk_number, s.region, count(*) regs, s.sorting
FROM stations s, validated_data vd, tags t
WHERE vd.station_key = s.station_key
AND vd.tag_id = t.tag_id
AND vd.tag_group = t.tag_group
AND vd.reg_time BETWEEN #bind($PERIOD_START_DATE 'TIMESTAMP') AND #bind($PERIOD_END_DATE 'TIMESTAMP')
GROUP BY TRUNC(vd.reg_time), s.abbreviation, s.node_id,s.pk_number, s.region, s.sorting)
SELECT abbreviation, node_id, pk_number, region, 100*sum(days_read)/(select count(*) from intervals) utilization from (
SELECT abbreviation, node_id, pk_number, region, sorting, sum(decode(regs,0,0,1)) days_read FROM pre_aggregation
GROUP BY abbreviation, node_id, pk_number, region, sorting, regs
UNION ALL SELECT DISTINCT abbreviation, node_id, pk_number, region, sorting,0 days_read FROM stations)
GROUP BY abbreviation, pk_number, region, node_id, sorting
-- Just to test
ORDER BY utilization desc, sorting
can you find it ? my cmd writes the following
ORDER BY utilization desc sorting]; nested exception is java.sql.SQLSyntaxErrorException: ORA-00936: missing expression
from my cmd I get the following:
Class: class org.springframework.jdbc.BadSqlGrammarException
Message: StatementCallback; bad SQL grammar [WITH
intervals AS ( SELECT to_timestamp('01-01-2013 00:00:00','dd-mm-yyyy hh24:mi:ss') - 1 + LEVEL interv, 1+EXTRACT(DAY FROM to_timestamp('30-01-2013 23:59:59','dd-mm-yyyy hh24:mi:ss')-to_timestamp('01-01-2013 00:00:00','dd-mm-yyyy hh24:mi:ss')) days_in_period FROM dual CONNECT BY LEVEL <= 1+EXTRACT(DAY FROM to_timestamp('30-01-2013 23:59:59','dd-mm-yyyy hh24:mi:ss')- to_timestamp('01-01-2013 00:00:00','dd-mm-yyyy hh24:mi:ss'))),
tags AS ( SELECT DISTINCT t.tag_group , tsi.tag_id FROM TRANSPONDER_SET t, TRANSPONDER_SET_ID TSI WHERE t.set_id IN ( ) AND t.set_id = tsi.set_id),
ag_regions AS ( SELECT /* materialize */
node_id, CONNECT_BY_ROOT ag2.name region_name
FROM area_group ag2
START WITH ag2.parent_node_id = 1
CONNECT BY NOCYCLE PRIOR ag2.node_id = ag2.parent_node_id),
stations AS (SELECT distinct ag.abbreviation,
ag.node_id,
rp.station_key,
ag.sorting
, CASE s.equipment_type_key WHEN 2 THEN s.station_name ELSE NULL END pk_number
,(SELECT region_name
FROM ag_regions
WHERE node_id = ag.node_id) region
FROM reg_point_view rp, area_group ag, station s
WHERE rp.area_group_node_id = ag.node_id
AND rp.station_key IN (1549)
AND (
rp.ant_ikraft_dato <= to_timestamp('30-01-2013 23:59:59','dd-mm-yyyy hh24:mi:ss') -- period end date
and (rp.ant_udlobs_dato is null
or rp.ant_udlobs_dato >= to_timestamp('01-01-2013 00:00:00','dd-mm-yyyy hh24:mi:ss') -- period start date
)
)
AND rp.station_key = s.station_key
),
pre_aggregation AS
(SELECT TRUNC(vd.reg_time) day, s.abbreviation, s.node_id, s.pk_number, s.region, count(*) regs, s.sorting
FROM stations s, validated_data vd, tags t
WHERE vd.station_key = s.station_key
AND vd.tag_id = t.tag_id
AND vd.tag_group = t.tag_group
AND vd.reg_time BETWEEN to_timestamp('01-01-2013 00:00:00','dd-mm-yyyy hh24:mi:ss') AND to_timestamp('30-01-2013 23:59:59','dd-mm-yyyy hh24:mi:ss') GROUP BY TRUNC(vd.reg_time), s.abbreviation, s.node_id,s.pk_number, s.region, s.sorting)
SELECT abbreviation, node_id, pk_number, region, 100*sum(days_read)/(select count(*) from intervals) utilization from (
SELECT abbreviation, node_id, pk_number, region, sorting, sum(decode(regs,0,0,1)) days_read FROM pre_aggregation
GROUP BY abbreviation, node_id, pk_number, region, sorting, regs
UNION ALL SELECT DISTINCT abbreviation, node_id, pk_number, region, sorting,0 days_read FROM stations)
GROUP BY abbreviation, pk_number, region, node_id, sorting
--Dette er lige lidt test kode!
ORDER BY utilization desc sorting]; nested exception is java.sql.SQLSyntaxErrorException: ORA-00936: missing expression
I have had a look at : java.sql.SQLException: ORA-00936: missing expression & ORA-00936: missing expression oracle
But I can't seem to figurer out if there is a comma to much or there is a spelling gone wrong.

This part:
SELECT DISTINCT t.tag_group, tsi.tag_id
FROM TRANSPONDER_SET t, TRANSPONDER_SET_ID TSI
WHERE t.set_id IN ( ) AND t.set_id = tsi.set_id
you have no values for t.set_id specified, IN () is not valid.

SQLite - Tricky query for max(column)

I have built a schema with SQL Fiddle:
SQL Fiddle - Schema
We have these columns in the testtable:
id [int] as primary key (not used -> not important)
end [int] - if a new stream is written into the table all but the last time have value '0' and the last one has the value '1'. This is to refer that the input-stream is finished here.
time_abs [int] - an absolute time (e.g. in minute-steps).
r_m [double] - is the mass rate sumed up over the time
T_r [double] - does not matter
type [string] - also does not matter here
x0 [string] - departure (e.g. where does the water come from?)
x1 [string] - destination (e.g. where does the water flow in?)
As you can see in the SQL Fiddle Schema we query every mass at a certain location and at a certain time like this:
SELECT
(SELECT (SELECT total(r_m)
FROM testtable
WHERE time_abs=11 AND end=0 AND x1='vessel2') +
(SELECT total(r_m)
FROM testtable
WHERE end=1 AND time_abs <=11 AND x1='vessel2')
)
-
(SELECT (SELECT total(r_m)
FROM testtable
WHERE time_abs=11 AND end=0 AND x0='vessel2') +
(SELECT total(r_m)
FROM testtable
WHERE end=1 AND time_abs <=11 AND x0='vessel2')
)
Which works well and fast.
But what we now want to query is the maximum of r_m at a certain time range.
E.g. pseudo code:
SELECT max(total(r_m))
WHERE time_abs BETWEEN 1 AND 30 & SELECT time_abs WHERE r_m=max ...
So that the result of this pseudo query is (123, 13-24) (max(total mass), time span where total mass=max) (manually checked at the SQL Fiddle Schema).
Any ideas?

Here's a query that shows the level in vessel 2 between 5 and 26 seconds:
select times.time_abs
, sum(
case when x1 = 'vessel2' and ([end] = 1 or times.time_abs = tt.time_abs)
then r_m else 0 end -
case when x0 = 'vessel2' and ([end] = 1 or times.time_abs = tt.time_abs)
then r_m else 0 end
) as lvl
from (
select distinct time_abs
from testtable
where time_abs between 5 and 26
) times
join testtable tt
on tt.time_abs <= times.time_abs
and 'vessel2' in (tt.x0, tt.x1)
group by
times.time_abs
To just display the maximum, you can:
select max(lvl)
from (
...query from above...
) as SubQueryAlias
Live example at SQL Fiddle.

SQL removing arithmatic from GROUP BY hibernate

I currently have a query that works fine in SQL, but I need it to run in Hibernate. It finds the number of records of 2 fields inside of a time window. However, a quick google search will tell you that you can't have an arithmetic expression in the GROUP BY clause in hibernate.
My table/SQL:
CREATE TABLE T (
Field_1 CHAR(4),
Field_2 CHAR(4),
Time_Stamp Date
);
--Populated
DECLARE #DATE1 DATE, #DATE2 DATE, #INTERVAL INT;
#DATE1 = TO_DATE('2013-01-01', 'yyyy-MM-dd');
#DATE2 = TO_DATE('2013-01-02', 'yyyy-MM-dd');
#INTERVAL = 15;
SELECT * FROM (
SELECT COUNT(Field_1) as field1,
COUNT(Field_2) as field2,
FLOOR( TO_NUMBER( TO_CHAR( Time_Stamp, 'hhmi' ) ) / #INTERVAL ) as timeInterval
FROM T
WHERE Time_Stamp BETWEEN #DATE1 AND #DATE2
GROUP BY FLOOR( TO_NUMBER( TO_CHAR( Time_Stamp, 'hhmi' ) ) / #INTERVAL )
) ORDER BY timeInterval ASC;
As you can see I can remove the ORDER BY so that there is no arithmetic in it using a subquery. Is it possible to do something similar in this instance with the GROUP BY clause or some other work around?
I have the query already written in hibernate, and can translate between the two fine, I only need the structure of the query so the arithmetic is outside of the group by funciton, SQL syntax is fine.

Perform your calculation in inner query then group by the result of it.
SELECT
COUNT(Field_1) as field1,
COUNT(Field_2) as field2,
timeInterval
FROM (
SELECT
Field_1,
Field_2,
FLOOR( TO_NUMBER( TO_CHAR( Time_Stamp, 'hhmi' ) ) / #INTERVAL ) as timeInterval
FROM T
WHERE Time_Stamp BETWEEN #DATE1 AND #DATE2
)
GROUP BY timeInterval
ORDER BY timeInterval ASC;

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.