story
I have two codes which perform the same task with the different way.
number_of_factors =10
number_of_formula_portions=5
number_of_travellers=15
cycles=2
call DB multiple times read data =5*15*2 times DB calls. and use ehcache.
call DB only 1 time (one query contains 10*5*15*2 of sql union operations in one query)
logically 2nd one should get more performance because of only 1 DB call, time-saving.
But practically 2nd one takes more time to evaluate query.
I have a dynamically generated union query. It has 10*5*15*2 (number_of_factors number_of_formula_portionsnumber_of_travellers*number_of_cycles) union statements. When I run it DB is taking too much time. But when I run it for one traveler via the application, It is fine. I thought logically reading all data at once has a lot of performance, But DB is getting stuck.
UNIT QUERY
select ? as FACTORNAME,
WEIGHTING,
? as KEYCYCLE,
? as KEYTRAVELLER,
? as KEYSUBFORMULA
from (
(SELECT *
FROM (
(SELECT ID,
ELEMENT_LOGIC_ID,
FACTOR_VALUE1,
FACTOR_VALUE2,
FACTOR_VALUE3,
FACTOR_VALUE4,
FACTOR_VALUE5,
FACTOR_VALUE6,
FACTOR_VALUE7,
WEIGHTING,
START_DATE,
END_DATE
FROM ABC_PRICE_FACTOR
WHERE ELEMENT_LOGIC_ID =?
AND START_DATE <= ?
AND END_DATE >= ?
AND FACTOR_VALUE1=?
ORDER BY ID DESC )
)
WHERE ROWNUM <= 1)
)
PARAMETERS
F577(String), 0(String), 0(String), 1(String), 577(Long), 2018-06-28 00:00:00.0(Timestamp), 2018-06-28 00:00:00.0(Timestamp), 1(String),
SAMPLE UNION QUERY
select * from (
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1>? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1<? AND FACTOR_VALUE2=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
...
)
note:
dynamically bellow part is different in the query. It is depending on factor match type [equal, lower bound, upper bound]. there are 7 factors. FACTOR_VALUE1,FACTOR_VALUE2.... like wise. So I am not going to show you actual SQL here. it has 1.8 MB query.
equal
FACTOR_VALUE1=?
or lower bound
FACTOR_VALUE1<?
or upper bound
FACTOR_VALUE1>?
business logic behind the scene
sorry guys for not providing actual and provide sample query. I am expecting a comment on my approach.
It's like we have data of exam result.
there are 10 subjects in school.
there are 15 students.
there are 2 exam term tests.
those are in DB.
this data can be read in 2 ways.
read all data at once, and filter in application level.[large union query]
read one student's one term results at one by one.[small query]
all ideas are welcome.
" I thought logically reading all data at once has a lot of performance, But DB is getting stuck."
Up to a point. One database call will likely be more efficient in terms of network traffic. But the actual call you make executes lots of queries and glues them together with UNION: so there is no performance gain to be had if the main problem is the performance of the individual queries.
One obvious change you can make: use UNION ALL rather than UNION if the subqueries are exclusive, and save yourself some unnecessary sorts.
Beyond that, the logic of the subqueries looks suspect: you're hitting the same subset of rows each time, so you should consider using subquery factoring:
with cte as (
SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE
FROM ABC_PRICE_FACTOR
WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ?
)
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from (
select weighting from (
select weighting
from cte
where FACTOR_VALUE1=?
order by id desc )
where rownum <= 1
union all
select weighting from (
select weighting
from cte
where FACTOR_VALUE1>?
order by id desc )
where rownum <= 1
union all
select weighting from (
select weighting
from cte
where FACTOR_VALUE1<? AND FACTOR_VALUE2=?
order by id desc )
where rownum <= 1
...
)
Warning: tuning without understanding of the data volumes and distribution (skew), data structures or business rules - i.e. what you're asking us to do - is a mug's game. We're just guessing here, and the best you can hope for is that one of those guesses is lucky.
I think such a query can be optimized with quite a dramatic speed improvement. To achieve that, one must understand the logic behind it, though. On Stackoverflow, this is best done by providing a minimal example and some code.
Idea 1) - START_DATE, END_DATE
You've shown us only ?, so we don't know if the ? for the dates of all subqueries are the same. If so, you could filter down the table once in an initial step, without repeating the filtering 1500 times:
WITH datefiltered AS (SELECT * FROM ABC WHERE start_date <= ? AND end_date >= ?)
SELECT ... FROM datefiltered;
Idea 2) - UNION
Your pattern of UNION a lot of subqueries SELECT * FROM (SELECT ... ORDER BY ...) WHERE rownum <=1 is unusual. That is not a bad thing in itself, but it is likely that the database engine is not optimized for unusual queries.
You are using ORDER BY ID DESC)) WHERE ROWNUM <= 1, that means you are searching for the newest(?) row in a category.
The traditional pattern is to find a column (or more, or even an expression) and partition the query by it:
SELECT id, col1, col2
FROM (
SELECT id, col1, col2,
ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank
FROM ABC
)
WHERE my_rank <= 1;
In your case, the category is likely much more complex, but you can put that in a big CASE statement that groups your data into your subqueries:
CASE WHEN factor1=xx AND factor2>yy THEN 'mycat1'
WHEN factor3>zz AND factor2<yy THEN 'mycat2'
etc
END;
To put all three together would look like
SELECT id, col1, col2
FROM (
SELECT id, col1, col2,
ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank
FROM (
SELECT id, col1, col2,
CASE WHEN factor...
END as mycategory
FROM ABC
WHERE start_date <= xx AND end_date >= yy
)
)
WHERE my_rank <= 1;
I have a RDBMS table with a column BIGINT type and values are not sequential. I have a java program where I want each thread to get data as per PARTITION_SIZE i.e. I want a pair of column values like after doing ORDER BY on result,
Column_Value at Row 0 , Column_Value at Row `PARTITION_SIZE`
Column_Value at Row `PARTITION_SIZE+1` , Column_Value at Row `2*PARTITION_SIZE`
Column_Value at Row `2*PARTITION_SIZE+1` , Column_Value at Row `3*PARTITION_SIZE`
Eventually, I will pass above value ranges in a SELECT query's BETWEEN clause to get divided data for each thread.
Currently, I am able to do this partitioning via Java by putting all values in a List ( after getting all values from DB ) and then getting values at those specific indices - {0,PARTITION_SIZE},{PARTITION_SIZE+1,2*PARTITION_SIZE} ..etc but problem there is that List might have millions of records and is not advisable to store in memory.
So I was wondering if its possible to write such a query using SQL itself which would return me those ranges like below?
row-1 -> minId , maxId
row-2 -> minId , maxId
....
Database is DB2.
For example,
For table column values 1,2,12,3,4,5,20,30,7,9,11 ,result of SQL query for a partition size =2 should be {1,2},{3,4} ,{5,7},{9,11},{12,20},{30} .
In my eyes the mod() function would solve your problem and you could choose a dynamic number of partitions with it.
WITH numbered_rows_temp as (
SELECT rownumber() over () as rownum,
col1,
...
coln
FROM table
ORDER BY col1)
SELECT * FROM numbered_rows_temp
WHERE mod(rownum, <numberofpartitions>) = 0
Fill in the appropriate and change the result from 0 to - 1 in your queries.
Michael Tiefenbacher's answer is probably more useful, as it avoids an extra query, but if you do want to determine ID ranges, this might work for you:
WITH parms(partition_size) AS (VALUES 1000) -- or whatever
SELECT
MIN(id), MAX(id),
INT(rn / parms.partition_size) partition_num
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) rn
FROM yourtable
) t , parms
GROUP BY INT(rn / parms.partition_size)
I'm trying to use the same query to sort my table columns but when I pass as a parameter the column to ORDER BY, it adds quotes before and after my column name. If you are using ORDER BY parameter, the column name have to be written without being between quotes or MySQL is going to ignore it.
Example or query to execute:
select * from app_user ORDER BY mobile_token ASC LIMIT 0 , 20
This is what hibernate send to MySQL:
select * from app_user ORDER BY 'mobile_token' ASC LIMIT 0 , 20
Java query:
query = JPA.em().createNativeQuery("select * from app_user ORDER BY :column ASC LIMIT :init , :page",AppUser.class);
query.setParameter("column", column);
query.setParameter("init", pageNumber*pageSize);
query.setParameter("page", pageSize);
I could change the NativeQuery by:
"select * from app_user ORDER BY "+column+" ASC LIMIT :init , :page"
but this is going to become my app unsafety.
You can only pass values as parameters to a query. Not column or field names. That would make it impossible for the database to know which columns are actually used in the query, and thus make it impossible to prepare the execution plan.
So your solution using concatenation is the only one. Just make sure the column doesn't come from the user. Or if it comes from the user, that it's a valid column name and that the user is allowed to use it.
I have a java program that returns a list of Long values (hundreds).
I would like to subtract to this list the values obtained from a select on an oracle database,
something like this:
SELECT 23 as num FROM DUAL UNION ALL
SELECT 17 as num FROM DUAL UNION ALL
SELECT 19 as num FROM DUAL UNION ALL
SELECT 67 as num FROM DUAL UNION ALL...
...
...
SELECT 68 as num FROM DUAL MINUS
SELECT NUM FROM MYTABLE
I presume that this operation has some performance issues...
Are there other better approaches?
Thank you
Case 1:
Use Global Temporary Tables (GTT):
CREATE GLOBAL TEMPORARY TABLE my_temp_table (
column1 NUMBER
) ON COMMIT DELETE ROWS;
Insert the List (Long value) into my_temp_table:
INSERT ALL
INTO my_temp_table (column1) VALUES (27)
INTO my_temp_table (column1) VALUES (32)
INTO my_temp_table (column1) VALUES (25)
.
.
.
SELECT 1 FROM DUAL;
Then:
SELECT * FROM my_temp_table
WHERE column1 NOT IN (SELECT NUM FROM MYTABLE);
Let me know if you have any issue.
Case 2:
Use TYPE table:
CREATE TYPE number_tab IS TABLE OF number;
SELECT column_value AS num
FROM TABLE (number_tab(1,2,3,4,5,6)) temp_table
WHERE NOT EXIST (SELECT 1 FROM MYTABLE WHERE MYTABLE.NUM = temp_table.num);
Assuming MyTable is much bigger than literal values, I think the best option is using a temporary table to store your values. This way your query is a lot cleaner.
If you are working in a concurrent environment (e.g. typical web app), use an id field, and delete when finished. Summarizing:
preliminary: create a table for temporary values TEMPTABLE(id, value)
for each transaction
get new unique/atomic id (new oracle sequence value, for example)
for each literal value: insert into temptable(new_id, value)
select * from temptable where id = new_id minus...
process result
delete from temp_table where id = new_id
Temporary tables are a good solution in oracle. This one can be used with an ORM persistence layer
When using a Java PreparedStatement, the question-mark placeholders aren't being detected. It would throw an error "The column index is out of range: 1, number of columns: 0" when invoking statementName.setLong(1, 123). My example is from Postgres 8.4, but the problem occurs before the SQL has a chance to make it to the SQL server.
After comparing against some working prepared statements, I realized that the broken one contained a subquery similar to:
SELECT * FROM (
SELECT DISTINCT (name)
id,
name
FROM MyTable
WHERE id > ?
ORDER BY name) AS Level1
ORDER BY 1
The solution that worked for me was to convert the query to a CTE (Common Table Expression). The revised query looks like this:
WITH Level1 AS (
SELECT DISTINCT (name)
id,
name
FROM MyTable
WHERE id > ?
ORDER BY name)
SELECT *
FROM Level1
ORDER BY 1
In JDBC, the parameter indexes for prepared statements begin at 1 instead of 0.