Oracle Performance Limitation of SQL UNION query & how to optimize - java

story
I have two codes which perform the same task with the different way.
number_of_factors =10
number_of_formula_portions=5
number_of_travellers=15
cycles=2
call DB multiple times read data =5*15*2 times DB calls. and use ehcache.
call DB only 1 time (one query contains 10*5*15*2 of sql union operations in one query)
logically 2nd one should get more performance because of only 1 DB call, time-saving.
But practically 2nd one takes more time to evaluate query.
I have a dynamically generated union query. It has 10*5*15*2 (number_of_factors number_of_formula_portionsnumber_of_travellers*number_of_cycles) union statements. When I run it DB is taking too much time. But when I run it for one traveler via the application, It is fine. I thought logically reading all data at once has a lot of performance, But DB is getting stuck.
UNIT QUERY
select ? as FACTORNAME,
WEIGHTING,
? as KEYCYCLE,
? as KEYTRAVELLER,
? as KEYSUBFORMULA
from (
(SELECT *
FROM (
(SELECT ID,
ELEMENT_LOGIC_ID,
FACTOR_VALUE1,
FACTOR_VALUE2,
FACTOR_VALUE3,
FACTOR_VALUE4,
FACTOR_VALUE5,
FACTOR_VALUE6,
FACTOR_VALUE7,
WEIGHTING,
START_DATE,
END_DATE
FROM ABC_PRICE_FACTOR
WHERE ELEMENT_LOGIC_ID =?
AND START_DATE <= ?
AND END_DATE >= ?
AND FACTOR_VALUE1=?
ORDER BY ID DESC )
)
WHERE ROWNUM <= 1)
)
PARAMETERS
F577(String), 0(String), 0(String), 1(String), 577(Long), 2018-06-28 00:00:00.0(Timestamp), 2018-06-28 00:00:00.0(Timestamp), 1(String),
SAMPLE UNION QUERY
select * from (
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1>? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1<? AND FACTOR_VALUE2=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
...
)
note:
dynamically bellow part is different in the query. It is depending on factor match type [equal, lower bound, upper bound]. there are 7 factors. FACTOR_VALUE1,FACTOR_VALUE2.... like wise. So I am not going to show you actual SQL here. it has 1.8 MB query.
equal
FACTOR_VALUE1=?
or lower bound
FACTOR_VALUE1<?
or upper bound
FACTOR_VALUE1>?
business logic behind the scene
sorry guys for not providing actual and provide sample query. I am expecting a comment on my approach.
It's like we have data of exam result.
there are 10 subjects in school.
there are 15 students.
there are 2 exam term tests.
those are in DB.
this data can be read in 2 ways.
read all data at once, and filter in application level.[large union query]
read one student's one term results at one by one.[small query]
all ideas are welcome.

" I thought logically reading all data at once has a lot of performance, But DB is getting stuck."
Up to a point. One database call will likely be more efficient in terms of network traffic. But the actual call you make executes lots of queries and glues them together with UNION: so there is no performance gain to be had if the main problem is the performance of the individual queries.
One obvious change you can make: use UNION ALL rather than UNION if the subqueries are exclusive, and save yourself some unnecessary sorts.
Beyond that, the logic of the subqueries looks suspect: you're hitting the same subset of rows each time, so you should consider using subquery factoring:
with cte as (
SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE
FROM ABC_PRICE_FACTOR
WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ?
)
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from (
select weighting from (
select weighting
from cte
where FACTOR_VALUE1=?
order by id desc )
where rownum <= 1
union all
select weighting from (
select weighting
from cte
where FACTOR_VALUE1>?
order by id desc )
where rownum <= 1
union all
select weighting from (
select weighting
from cte
where FACTOR_VALUE1<? AND FACTOR_VALUE2=?
order by id desc )
where rownum <= 1
...
)
Warning: tuning without understanding of the data volumes and distribution (skew), data structures or business rules - i.e. what you're asking us to do - is a mug's game. We're just guessing here, and the best you can hope for is that one of those guesses is lucky.

I think such a query can be optimized with quite a dramatic speed improvement. To achieve that, one must understand the logic behind it, though. On Stackoverflow, this is best done by providing a minimal example and some code.
Idea 1) - START_DATE, END_DATE
You've shown us only ?, so we don't know if the ? for the dates of all subqueries are the same. If so, you could filter down the table once in an initial step, without repeating the filtering 1500 times:
WITH datefiltered AS (SELECT * FROM ABC WHERE start_date <= ? AND end_date >= ?)
SELECT ... FROM datefiltered;
Idea 2) - UNION
Your pattern of UNION a lot of subqueries SELECT * FROM (SELECT ... ORDER BY ...) WHERE rownum <=1 is unusual. That is not a bad thing in itself, but it is likely that the database engine is not optimized for unusual queries.
You are using ORDER BY ID DESC)) WHERE ROWNUM <= 1, that means you are searching for the newest(?) row in a category.
The traditional pattern is to find a column (or more, or even an expression) and partition the query by it:
SELECT id, col1, col2
FROM (
SELECT id, col1, col2,
ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank
FROM ABC
)
WHERE my_rank <= 1;
In your case, the category is likely much more complex, but you can put that in a big CASE statement that groups your data into your subqueries:
CASE WHEN factor1=xx AND factor2>yy THEN 'mycat1'
WHEN factor3>zz AND factor2<yy THEN 'mycat2'
etc
END;
To put all three together would look like
SELECT id, col1, col2
FROM (
SELECT id, col1, col2,
ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank
FROM (
SELECT id, col1, col2,
CASE WHEN factor...
END as mycategory
FROM ABC
WHERE start_date <= xx AND end_date >= yy
)
)
WHERE my_rank <= 1;

Related

Is it possible to get partitioned data using SQL?

I have a RDBMS table with a column BIGINT type and values are not sequential. I have a java program where I want each thread to get data as per PARTITION_SIZE i.e. I want a pair of column values like after doing ORDER BY on result,
Column_Value at Row 0 , Column_Value at Row `PARTITION_SIZE`
Column_Value at Row `PARTITION_SIZE+1` , Column_Value at Row `2*PARTITION_SIZE`
Column_Value at Row `2*PARTITION_SIZE+1` , Column_Value at Row `3*PARTITION_SIZE`
Eventually, I will pass above value ranges in a SELECT query's BETWEEN clause to get divided data for each thread.
Currently, I am able to do this partitioning via Java by putting all values in a List ( after getting all values from DB ) and then getting values at those specific indices - {0,PARTITION_SIZE},{PARTITION_SIZE+1,2*PARTITION_SIZE} ..etc but problem there is that List might have millions of records and is not advisable to store in memory.
So I was wondering if its possible to write such a query using SQL itself which would return me those ranges like below?
row-1 -> minId , maxId
row-2 -> minId , maxId
....
Database is DB2.
For example,
For table column values 1,2,12,3,4,5,20,30,7,9,11 ,result of SQL query for a partition size =2 should be {1,2},{3,4} ,{5,7},{9,11},{12,20},{30} .
In my eyes the mod() function would solve your problem and you could choose a dynamic number of partitions with it.
WITH numbered_rows_temp as (
SELECT rownumber() over () as rownum,
col1,
...
coln
FROM table
ORDER BY col1)
SELECT * FROM numbered_rows_temp
WHERE mod(rownum, <numberofpartitions>) = 0
Fill in the appropriate and change the result from 0 to - 1 in your queries.
Michael Tiefenbacher's answer is probably more useful, as it avoids an extra query, but if you do want to determine ID ranges, this might work for you:
WITH parms(partition_size) AS (VALUES 1000) -- or whatever
SELECT
MIN(id), MAX(id),
INT(rn / parms.partition_size) partition_num
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) rn
FROM yourtable
) t , parms
GROUP BY INT(rn / parms.partition_size)

How can I create rows for dates missing from a range?

I am trying to get data for all dates in a range provided by my query, but I'm only getting the dates that actually exist in my table - missing dates are not reported. I need to create records in the table for those missing dates, with other columns left null, and then include them in the results.
My table table_name has records like:
ID Name Date_only
---- ---- -----------
1234 xyz 01-Jan-2014
1234 xyz 02-Jan-2014
1234 xyz 04-Jan-2014
...
For example, for the range 01-Jan-2014 to 04-Jan-2014, my query is:
select * from table_name
where id=1234
and (date_only >= '01-Jan-14' and date_only <= '04-Jan-14')
From Java or queried directly this shows three rows, with no data for 03-Jan-2014.
I need a single statement to insert rows for any missing dates into the table and return the data for all four rows. How can I do that?
UPDATE
Followed query worked for only if only 1 record available in table OR search range 2-5 days,
SELECT LEVEL, to_date('2014-11-08','yyyy-mm-dd') + level as day_as_date FROM DUAL CONNECT BY LEVEL <= 10 .
UPDATE WITH FIDDLE EXAMPLE
I got Error is:
I have table data and same query executed then i got error is ORA-02393: exceeded call limit on CPU usage, fiddle example is : my owntable sqlfiddle example .thanks in advance
you can use the below SQL for your purpose.The sql fiddle here http://sqlfiddle.com/#!4/3ee61/27
with start_and_end_dates as (select min(onlydate) min_date
,max(onlydate) max_date
from mytable
where id='1001'
and onlydate >= to_date('01-Jan-2015','dd-Mon-YYYY')
and onlydate <= to_date('04-Jan-2015','dd-Mon-YYYY')),
missing_dates as (select min_date + level-1 as date_value
from start_and_end_dates connect by level <=(max_date - min_date) + 1)
select distinct id,name,date_value
from mytable,missing_dates
where id='1001'
order by date_value
EDIT1:- Using your other example.The sqlfiddle is http://sqlfiddle.com/#!4/4c727/16
with start_and_end_dates as (select min(onlydate) min_date
,max(onlydate) max_date
from mytable
where name='ABCD'),
missing_dates as (select min_date + level-1 as date_value
from start_and_end_dates connect by level <=(max_date - min_date) + 1)
select distinct id,name,date_value
from mytable,missing_dates
where name='ABCD'
order by date_value;
You can use a query like
SELECT LEVEL, to_date('2014-01-01','yyyy-mm-dd') + level as day_as_date
FROM DUAL
CONNECT BY LEVEL <= 1000
to get a list of 1000 days from Jan 1 2014 (adjust to your need)
Next do an insert from select
INSERT INTO table_name (date_only)
SELECT day_as_date FROM (<<THE_QUERY_ABOVE>>)
WHERE day_as_date NOT IN (SELECT date_only FROM table_name)

Static list MINUS select statement

I have a java program that returns a list of Long values (hundreds).
I would like to subtract to this list the values obtained from a select on an oracle database,
something like this:
SELECT 23 as num FROM DUAL UNION ALL
SELECT 17 as num FROM DUAL UNION ALL
SELECT 19 as num FROM DUAL UNION ALL
SELECT 67 as num FROM DUAL UNION ALL...
...
...
SELECT 68 as num FROM DUAL MINUS
SELECT NUM FROM MYTABLE
I presume that this operation has some performance issues...
Are there other better approaches?
Thank you
Case 1:
Use Global Temporary Tables (GTT):
CREATE GLOBAL TEMPORARY TABLE my_temp_table (
column1 NUMBER
) ON COMMIT DELETE ROWS;
Insert the List (Long value) into my_temp_table:
INSERT ALL
INTO my_temp_table (column1) VALUES (27)
INTO my_temp_table (column1) VALUES (32)
INTO my_temp_table (column1) VALUES (25)
.
.
.
SELECT 1 FROM DUAL;
Then:
SELECT * FROM my_temp_table
WHERE column1 NOT IN (SELECT NUM FROM MYTABLE);
Let me know if you have any issue.
Case 2:
Use TYPE table:
CREATE TYPE number_tab IS TABLE OF number;
SELECT column_value AS num
FROM TABLE (number_tab(1,2,3,4,5,6)) temp_table
WHERE NOT EXIST (SELECT 1 FROM MYTABLE WHERE MYTABLE.NUM = temp_table.num);
Assuming MyTable is much bigger than literal values, I think the best option is using a temporary table to store your values. This way your query is a lot cleaner.
If you are working in a concurrent environment (e.g. typical web app), use an id field, and delete when finished. Summarizing:
preliminary: create a table for temporary values TEMPTABLE(id, value)
for each transaction
get new unique/atomic id (new oracle sequence value, for example)
for each literal value: insert into temptable(new_id, value)
select * from temptable where id = new_id minus...
process result
delete from temp_table where id = new_id
Temporary tables are a good solution in oracle. This one can be used with an ORM persistence layer

Select query with multiple where clauses

I have a list of serial numbers: 111111, 222222, AAAAAA, FFFFFF and I want to return a corresponding value or null from a table depending on whether or not the value exists.
Currently I loop through my list of serial numbers, query using the following statement:
"SELECT cnum FROM table WHERE serial_num = " + serialNumber[i];
and then use the value if one is returned.
I would prefer to do this is one query and get results similar to:
Row | cnum
------------
1 | 157
2 | 4F2
3 | null
4 | 93O
5 | null
6 | 9F3
Is there a query to do this or am I stuck with a loop?
It sounds as if you have some sort of Java Array or Collection of serial numbers, and perhaps you want to check to see if these numbers are found in the DB2 table, and you'd like to do the whole list all at once, rather than one at a time. Good thinking.
So you want to have a set of rows with which you can do a left join to the table, with null indicating that the corresponding serial was not in the table. Several answers have started to use this approach. But they are not returning your row number, and they are using SELECT UNION's which seems a round-about way to get what you want.
VALUES clause
Your FROM clause can be a "nested-table-expression"
which can be a (fullselect)
with a correlation-clause. The (fullselect) can, in turn, be a VALUES clause. So you could have something like this:
FROM (VALUES (1, '157'), (2, '4F2'), (3, '5MISSING'), (4, '93O'), ...
) as Lst (rw, sn)
You can then LEFT JOIN this to the table, and get a two-column result table like you asked for:
SELECT Lst.rn, t.serial_num
FROM (VALUES (1, '157'), (2, '4F2'), (3, '5MISSING'), (4, '93O'), ...
) as Lst (rw, sn)
LEFT JOIN sometable t ON t.serial_num = Lst.sn
With this method, you will probably need a loop to build your dynamic SQL statement string, using the values from your collection.
If it was embedded SQL, we might be able to reference a host array variable containing your serial numbers. But alas, in Java I am not sure how to manage using the list directly in SQL, without using some loop.
If you use only an "in" it is not going to return null for the missing value forcing you to do some coding in the application (probably the most efficient way).
If you wanted the database to do all the work (may or may not be ideal) then
you would have to trick db2 into returning your list regardless.
Something like this might work, faking the null values to be returned from sysdummy with the common table expression (with part):
with all_serials as (
select '111111' as serialNumber from sysibm.sysdummy1 union all ,
select '222222' as serialNumber from sysibm.sysdummy1 union all ,
select 'AAAAAA' as serialNumber from sysibm.sysdummy1 union all ,
select 'FFFFFF' as serialNumber from sysibm.sysdummy1
)
select
t1.serialNumber,
t2.serialNumber as serialNumberExists
from
all_serials as t1 left outer join
/* Make sure the grain of the_Table is at "serialNumber" */
the_table as t2 on t1.serialNumber = t2.serialNumber
You can use the SQL IN keyword. You'd need to dynamically generate the list, but basically it'd look like:
SELECT cnum FROM table WHERE serial_num in ('111111', '2222222', '3333333', 'AAAAAAA'...)
Try something like:
select t.cnum
from
(select '111111' serial_num from sysibm.sysdummy1 union all
select '222222' serial_num from sysibm.sysdummy1 union all
select 'AAAAAA' serial_num from sysibm.sysdummy1 union all
select 'FFFFFF' serial_num from sysibm.sysdummy1) v
left join table t on v.serial_num = t.serial_num
I'm not sure if I get you correctly, but this could help:
String query = "SELECT cnum FROM table WHERE ";
for(int i = 0; i < serialNumber.length; i++)
query += "serial_num='" + serialNumber[i] + "' OR ";
query += "serial_num IS NULL "
System.out.println(query);

SQL with rank and partition

I need to execute this sql:
select * from
(select nt.*,
rank() over (partition by feld0 order by feld1 desc) as ranking
from (select bla from test) nt)
where ranking < 3
order by 1,2
This sql works fine in my oracle database but in the h2 database which i use sometimes this doesnt work because rank and partition are not defined.
So i need to transform this sql so that it works in h2 and oracle.
I want to use java to execute this sql. So is it possible to split this sql into different sqls without rank and partition? And then to handle it with java?
If feld1 is unique within feld0 partitions, you could:
select *
, (
select count(*)
from YourTable yt2
where yt2.feld0 = yt1.feld0 -- Same partition
and yt2.feld1 <= yt1.feld1 -- Lower or equal rank
) as ranking
from YourTable yt1

Categories