I try to improve the performance of my application. For this, I replace the current statement by prepared-statement to use the oracle cash.
But several of my queries are dynamic (conditional criteria and tables) and I would like to know the best solution to reduce the execution time.
In stackoverflow, I found two solutions.
The first is to build query with if statement :
if(a.equals("Test")) {
whereClause.append ("COLUMN1 = ? ");
}
if(b.equals("Test2")) {
whereClause.append ("COLUMN2 = ? ");
} etc...
The second is to create one query with all criteria but to add "OR var is null"
where (? is null OR (COLUMN1 = ?)) AND (? is null OR (COLUMN2 = ?)) etc.
In your opinion what is the best way In your to performance knowing that a request may contain 40/50 criteria and some tables can be optional too. Or is there another solution.
The database that use is oracle with datasource and jdk 1.5.
Thank you in advance for your help.
First JDBC and Oracle are capable of handling and caching multiple prepared statements.
First approach:
if(a.equals("Test")) {
whereClause.append ("COLUMN1 = ? ");
}
if(b.equals("Test2")) {
whereClause.append ("COLUMN2 = ? ");
}
will certainly lead to more different statements and thous to more prepared statements, but each of this statements is optimizable at preparation-time as no variable is optional.
Second approach:
where (? is null OR (COLUMN1 = ?)) AND (? is null OR (COLUMN2 = ?)) etc.
will result in exactly one prepared statement, but during prepare the query-plan needs to include the possibility that all parameters are non-null.
So it really becomes a matter of benchmarking and how many optional parameters there are. If you have too many prepared statements one might not be cached (but it might also be, that it is so seldomly used, that this does not disturb overall performance).
I also don't know how much overhead the optional parameters in the second approach create even if they are evaluated for each line as the optimizer could not remove them.
Finally if you have indexes prepared that cover some combinations of your optional parameters the first approach should be faster as during prepare the possible indexes are evaluated and the best is choosen (query-plan).
The biggest wins in prepared statement is the query-plan (optimization) being done ahead of time. I, to be honest, doubt the optimization will be very big in case you have 20 criteria like that and only a few are actually used. Why don't you test it and see which is quickest?
Also, I think making a stored procedure out of this could be very advantageous.
I would use PreparedStatement with dynamic query without any additional logic. If a query repeats JDBC will find previously executed cached statement.
I agree with Evgeniy, PreparedStatements are the way to go. If you have many parameters, the upkeep is easier if you use named parameters instead of question marks. See: http://www.javaworld.com/article/2077706/core-java/named-parameters-for-preparedstatement.html. I have used this with about 20 criteria.
Related
When using JDBC's PreparedStatements to query Oracle, consider this:
String qry1 = "SELECT col1 FROM table1 WHERE rownum=? AND col2=?";
String qry2 = "SELECT col1 FROM table1 WHERE rownum=1 AND col2=?";
String qry3 = "SELECT col1 FROM table1 WHERE rownum=1 AND col2=" + someVariable ;
The logic dictates that the value of rownum is always a constant (1 in this example). While the value of col2 is a changing variable.
Question 1: Are there any Oracle server performance advantages (query compilation, caching, etc.) to using qry1 where rownum value is parameterized, over qry2 where rownum's constant value is hardcoded?
Question 2: Ignoring non-performance considerations (such as SQL Injections, readability, etc.), are there any Oracle server performance advantages (query compilation, caching, etc.) to using qry2 over qry3 (in which the value of col2 is explicitly appended, not parameterized).
Answer 1: There are no performance advantages to using qry1 (a softcoded query) over qry2 (a query with reasonable bind variables).
Bind variables improve performance by reducing query parsing; if the bind variable is a constant there is no extra parsing to avoid.
(There are probably some weird examples where adding extra bind variables improves the performance of one specific query. Like with any forecasting program, occasionally if you feed bad information to the Oracle optimizer the result will be better. But it's important to understand that those are exceptional cases.)
Answer 2: There are many performance advantages to using qry2 (a query with reasonable bind variables) over qry3 (a hardcoded query).
Bind variables allow Oracle re-use a lot of the work that goes into query parsing (query compilation). For example, for each query Oracle needs to check that the user has access to view the relevant tables. With bind variables that work only needs to be done once for all executions of the query.
Bind variables also allow Oracle to use some extra optimization tricks that only occur after the Nth run. For example, Oracle can use cardinality feedback to improve the second execution of a query. When Oracle makes a mistake in a plan, for example if it estimates a join will produce 1 row when it really produces 1 million, it can sometimes record that mistake and use that information to improve the next run. Without bind variables the next run will be different and it won't be able to fix that
mistake.
Bind variables also allow for many different plan management features. Sometimes a DBA needs to change an execution plan without changing the text of the query. Features like SQL plan baselines, profiles, outlines, and DBMS_ADVANCED_REWRITE will not work if the query text is constantly changing.
On the other hand, there are a few reasonable cases where it's better to hard-code the queries. Occasionally an Oracle feature like partition pruning cannot understand the expression and it helps to hardcode the value. For large data warehouse queries the extra time to parse a query may be worth it if the query is going to run for a long time anyway.
(Caching is unlikely to affect either scenario. Result caching of a statement is rare, it's much more likely that Oracle will cache only the blocks of the tables used in the statement. The buffer cache probably does not care if those blocks are accessed by one statement many times or by many statements one time)
I would like to know if there is any smart way of making a SQL statement for a search engine where there are 5 optional parameters. All parameters can be used or only one of them, or a mix of any of them.. This makes up to 3000+ different combinations.
The statement needs to be prepared to avoid SQL injections.
I've looked at this post, but it dosent quite cut.
What I'm looking for is something like,
String sql =SELECT * FROM table WHERE (optional1)=? AND (optional2)=? AND (optional3)=? AND (optional4)=? AND (optional5)=?
prepared.setString(1, optional1)
and so on...
Use your java code to add the options to the where clause based on the presence of your arguments (their length or existence, whichever). That way if the optional parameter is not needed, it won't even be part of your SQL expression. Simple.
#a1ex07 has given the answer for doing this as a single query. Using NULLs and checking for them in each condition.
WHERE
table.x = CASE WHEN #x IS NULL THEN table.x ELSE #x END
or...
WHERE
(#x IS NULL OR table.x = #x)
or...
WHERE
table.x = COALESCE(#x, table.x)
etc, etc.
There is one warning, however; As convenient as it is to make one query to do all of this, All of these answers are sub-optimal. Often they're horednous.
When you write ONE query, only ONE execution plan is created. And that ONE execution plan must be suitable for ALL possible combinations of values. But that fixes which indexes are searched, what order they're searched, etc. It yields the least worst plan for a one-size-fits-all query.
Instead, you're better adding the conditions as necessary. You still parameterise them, but you don't include a condition if you know the parameter is NULL.
This is a good link explaining it further, it's for MS SQL Server specifically but it's generally applicatble to any RDBMS that caches the plans after it compiles the SQL.
http://www.sommarskog.se/dyn-search.html
I believe it should work (haven't tested though)
SELECT * FROM table
WHERE
field1 = CASE
WHEN ? IS NULL THEN field1
ELSE ?
END AND
field2 = CASE
WHEN ? IS NULL THEN field2
ELSE ?
END AND .... etc
//java code
if ([optional1 is required])
{
prepared.setString(1, optional1) ;
prepared.setString(2, optional1) ;
}
else
{
prepared.setNull(1, java.sql.Types.VARCHAR) ;
prepared.setNull(2, java.sql.Types.VARCHAR) ;
}
etc.
We have the following code :
String templateQuery = "select * from my_table where col1=$1 or col2 like '%$2.$1'";
String tmp = templateQuery;
for(int i=1;i<=maxCols;i++) {
tmp = tmp.replaceAll("\\$"+i, data[i-1]);
}
This code works fine as maxCols never exceeds 10. But my colleague disagree with me stating that this code consumes too much memory. Can you help us ?
EDIT:
I have change the initial templateQuery with a much realistic one. Secondly, templateQuery can potentially be a big string.
EDIT 2:
Thanks for those who have pointed out the SQLInjection problem.
Don't do this.
Not for performance reasons (which will be miniscule compared with the cost of the database query), but to avoid SQL injection attacks. What happens if data[0] is actually the string
' OR 'x' = 'x
?
Then you'll end up with a SQL statement of:
SELECT * FROM my_table WHERE col1='' OR 'x' = 'x'
which I think we can agree isn't what you wanted.
Use a parameterized SQL statement instead (PreparedStatement) and get the database driver to send the parameter values separately.
EDIT: In other comments, the OP has specified that the template string can be quite long, and some parameters may actually involve multiple initial values combined together. I still say that the cost of replacement is likely to be insignificant in the grand scheme of things, and I still say that PreparedStatement is the way to go. You should perform whatever combining operations you need to on the input before setting them as the values for the PreparedStatement - so the template may need the SQL with SQL placeholders, and then "subtemplates" to work out how to get from your input to the parameters for the PreparedStatement. Whatever you do, putting the values directly into the SQL is the wrong approach.
Why aren't you just using a PreparedStatement with replacement parameters?
String templateQuery = "SELECT * FROM my_table WHERE col1 = ?";
PreparedStatement ps = con.prepareStatement(templateQuery);
for (int i = 0; i < data.length; i++) {
ps.setString(i + 1, data[i]);
}
ResultSet rs = ps.executeQuery();
You're otherwise vulnerable to SQL injection if you use string replacement like you have.
He is correct, because you create maxCols tmp Strings.
I realized that it is for Sql commands, if is it, why you do not use PreparedStatement (http://download.oracle.com/javase/1.4.2/docs/api/java/sql/PreparedStatement.html) for this task?
Also, for formatting strings, rather than use substitute, use Formatter, it is much more elegant: http://download.oracle.com/javase/1.5.0/docs/api/java/util/Formatter.html
Whether this consumes too much memory is open to debate (what's "too much"?)
Nonetheless, for this kind of stuff you should use PreparedStatement. It allows you to do pretty much exactly what you're trying to achieve, but in a much cleaner fashion.
Your colleague is right in that every string replacement creates a new copy of the string. (However, the cost of these is probably negligible with less than 10 parameters.) Moreover, for every execution of this query the SQL engine needs to parse it afresh, which consumes far more additional resources each time.
The potential bigger problem though is that the code is suscept to SQL injection. If the input data is coming from an external source, a hacker can pass in a parameter such as "col1; drop table my_table;", effectively deleting your whole table.
All of these can be solved by using a PreparedStatement instead.
Let's say I have a basic query like:
SELECT a, b, c FROM x WHERE y=[Z]
In this query, [Z] is a "variable" with different values injected into the query.
Now consider a situation where we want to do the same query with 2 known different values of [Z], say Z1 and Z2. We can make two separate queries:
SELECT a, b, c FROM x WHERE y=Z1
SELECT a, b, c FROM x WHERE y=Z2
Or perhaps we can programmatically craft a different query like:
SELECT a, b, c FROM x WHERE y in (Z1, Z2)
Now we only have one query (1 < 2), but the query construction and result set deconstruction becomes slightly more complicated, since we're no longer doing straightforward simple queries.
Questions:
What is this kind of optimization called? (Is it worth doing?)
How can it be implemented cleanly from a Java application?
Do existing Java ORM technologies help?
What is this kind of optimization called?
I'm not sure if there is a "proper" term for it, but I've heard it called query batching or just plain batching.
(Is it worth doing?)
It depends on:
whether it is worth the effort optimizing the query at all,
the number of elements in the set; i.e. ... IN ( ... ),
the overheads of making a JDBC request versus the costs of query compilation, etc.
But in the right circumstances this is definitely a worthwhile optimization.
How can it be implemented cleanly from a Java application?
It depends on your definition of "clean" :-)
Do existing Java ORM technologies help?
It depends on the specific ORM technology you are talking, but (for example) the Hibernate HQL language supports the constructs that would allow you to do this kind of thing.
An RDBMS can normally return the result of a query with IN in equal or less time than it takes to execute two queries.
If there is no index on column Y, then a full table scan is required. With two queries, two table scans will be performed instead of one.
If there is an index, then the single value in the WHERE clause, or the values in the IN list, are used one at a time to look up the index. When some rows are found for one of the values in the IN list, they are added to the returned result.
So it is better to use the IN predicate from the performance point of view.
When Y represents a column with unique values, then it is easy to decompose the result. Otherwise, there is slightly more work.
I honestly can't say how much of a hit (if any) you will get if you run this two Prepared queries (even using plain JDBC) over combining them with an IN statement.
If you have an array or List of values, you could manually build the prepare statement using JDBC:
// Assuming values is an int[] and conn is a java.sql.Connection
// Also uses Apache Commons StringUtils
StringBuilder query = new StringBuilder("SELECT a, b, c FROM x WHERE y IN (");
query.append(StringUtils.join(Collections.nCopies(values.length, "?"), ',');
query.append(")");
PreparedStatement stmt = conn.prepareStatement(query.toString());
for (int i = 0; i < values.length; i++) {
stmt.setInt(i + 1, values[i]);
}
stmt.execute();
// Get results after this
Note: I haven't actually tested this. In theory, if you used this a lot, you'd generalize this and make it a method.
Note that an "in" (where blah in ( 1, 5, 10 ) ) is the same as writing "where blah = 1 OR blah = 5 OR blah = 10". This is important if you are using, say, Apache Torque which would create lovely prepared statements except in the case of an "in" clause. (That might be fixed by now.)
And the difference in performance that we found between the unprepared in clause and the prepared ORs was huge.
So a number of ORMs handle it, but not all of 'em handle it well. Be sure to examine the queries sent to the database.
And while deconstructing the combined result set from a single query might be more difficult than handling a single result, it's probably a lot easier than trying to combine two result sets from two queries. And probably significantly faster if a lot of duplicates are involved.
I want to issue a query like the following
select max(col1), f(:1, col2) from t group by f(:1, col2)
where :1 is a bind variable. Using PreparedStatement, if I say
connection.prepareStatement
("select max(col1), f(?, col2) from t group by f(?, col2)")
I get an error from the DBMS complaining that f(?, col2) is not a GROUP BY expression.
How does one normally solve this in JDBC?
I suggest re-writing the statement so that there is only one bind argument.
This approach is kind of ugly, but returns the result set:
select max(col1)
, f_col2
from (
select col1
, f(? ,col2) as f_col2
from t
)
group
by f_col2
This re-written statement has a reference to only a single bind argument, so now the DBMS sees the expressions in the GROUP BY clause and the SELECT list are identical.
HTH
[EDIT]
(I wish there were a prettier way, this is why I prefer the named bind argument approach that Oracle uses. With the Perl DBI driver, positional arguments are converted to named arguments in the statement actually sent to Oracle.)
I didn't see the problem at first, I didn't understand the original question. (Apparently, several other people missed it too.) But after running some test cases, it dawned on me what the problem was, what the question was working.
Let me see if I can state the problem: how to get two separate (positional) bind arguments to be treated (by the DBMS) as if it were two references to the same (named) bind argument.
The DBMS is expecting the expression in the GROUP BY to match the expression in the SELECT list. But the two expressions are considered DIFFERENT even when the expressions are identical, when the only difference is that each expression references a different bind variable. (We can demonstrate some test cases that at least some DBMS will allow, but there are more general cases that will raise an exception.)
At this point the short answer is, that's got me stumped. The suggestion I have (which may not be an actual answer to the original question) is to restructure the query.
[/EDIT]
I can provide more details if this approach doesn't work, or if you have some other problem figuring it out. Or if there's a problem with performance (I can see the optimizer choosing a different plan for the re-written query, even though it returns the specified result set. For further testing, we'd really need to know what DBMS, what driver, statistics, etc.)
EDIT (eight and a half years later)
Another attempt at a query rewrite. Again, the only solution I come up with is a query with one bind placeholder. This time, we stick it into an inline view that returns a single row, and join that to t. I can see what it's doing; I'm not sure how the Oracle optimizer will see this. We may want (or need) to do an explicit conversion e.g. TO_NUMBER(?) AS param, TO_DATE(?,'...') AS param, TO_CHAR(?) AS param, depending on the datatype of the bind parameter, and the datatype we want to be returned as from the view.)
This is how I would do it in MySQL. The original query in my answer does the join operation inside the inline view (MySQL derived table). And we want to avoid materializing a hughjass derived table if we can avoid it. Then again, MySQL would probably let the original query slide as long as sql_mode doesn't include ONLY_FULL_GROUP_BY. MySQL would also let us drop the FROM DUAL)
SELECT MAX(t.col1)
, f( v.param ,t.col2)
FROM t
CROSS
JOIN ( SELECT ? AS param FROM DUAL) v
GROUP
BY f( v.param ,t.col2)
According to the answer from MadusankaD, within the past eight years, Oracle has added support for reusing the same named bind parameters in the JDBC driver, and retaining equivalence. (I haven't tested that, but if that works now, then great.)
Even though you have issued a query through JDBC driver(using PreparedStatement) like this:
select max(col1), f(:1, col2) from t group by f(:1, col2)
At last JDBC driver replaces these like below query before parsing to the database , even though you have used the same binding variable name in the both places.
select max(col1), f(*:1*, col2) from t group by f(*:2*, col2)
But in oracle this will not be recognized as a valid group by clause.
And also normal JDBC driver doesn't support named bind variables.
For that you can use OraclePreparedStatement class for you connection. That means it is oracle JDBC. Then you can use named bind variables. It will solve your issue.
Starting from Oracle Database 10g JDBC drivers, bind by name is supported using the setXXXAtName methods.
http://docs.oracle.com/cd/E24693_01/java.11203/e16548/apxref.htm#autoId20
Did you try using ? rather than the named bind variables? As well, which driver are you using? I tried this trivial example using the thin driver, and it seemed to work fine:
PreparedStatement ps = con.prepareStatement("SELECT COUNT(*), TO_CHAR(SYSDATE, ?) FROM DUAL GROUP BY TO_CHAR(SYSDATE, ?)");
ps.setString(1, "YYYY");
ps.setString(2, "YYYY");
ps.executeQuery();
In the second case, there are actually two variables - you will need to send them both with the same value.