The Performance Consequences of Parameterizing Constants in PreparedStatement Queries

The Performance Consequences of Parameterizing Constants in PreparedStatement Queries - java

When using JDBC's PreparedStatements to query Oracle, consider this:
String qry1 = "SELECT col1 FROM table1 WHERE rownum=? AND col2=?";
String qry2 = "SELECT col1 FROM table1 WHERE rownum=1 AND col2=?";
String qry3 = "SELECT col1 FROM table1 WHERE rownum=1 AND col2=" + someVariable ;
The logic dictates that the value of rownum is always a constant (1 in this example). While the value of col2 is a changing variable.
Question 1: Are there any Oracle server performance advantages (query compilation, caching, etc.) to using qry1 where rownum value is parameterized, over qry2 where rownum's constant value is hardcoded?
Question 2: Ignoring non-performance considerations (such as SQL Injections, readability, etc.), are there any Oracle server performance advantages (query compilation, caching, etc.) to using qry2 over qry3 (in which the value of col2 is explicitly appended, not parameterized).

Answer 1: There are no performance advantages to using qry1 (a softcoded query) over qry2 (a query with reasonable bind variables).
Bind variables improve performance by reducing query parsing; if the bind variable is a constant there is no extra parsing to avoid.
(There are probably some weird examples where adding extra bind variables improves the performance of one specific query. Like with any forecasting program, occasionally if you feed bad information to the Oracle optimizer the result will be better. But it's important to understand that those are exceptional cases.)
Answer 2: There are many performance advantages to using qry2 (a query with reasonable bind variables) over qry3 (a hardcoded query).
Bind variables allow Oracle re-use a lot of the work that goes into query parsing (query compilation). For example, for each query Oracle needs to check that the user has access to view the relevant tables. With bind variables that work only needs to be done once for all executions of the query.
Bind variables also allow Oracle to use some extra optimization tricks that only occur after the Nth run. For example, Oracle can use cardinality feedback to improve the second execution of a query. When Oracle makes a mistake in a plan, for example if it estimates a join will produce 1 row when it really produces 1 million, it can sometimes record that mistake and use that information to improve the next run. Without bind variables the next run will be different and it won't be able to fix that
mistake.
Bind variables also allow for many different plan management features. Sometimes a DBA needs to change an execution plan without changing the text of the query. Features like SQL plan baselines, profiles, outlines, and DBMS_ADVANCED_REWRITE will not work if the query text is constantly changing.
On the other hand, there are a few reasonable cases where it's better to hard-code the queries. Occasionally an Oracle feature like partition pruning cannot understand the expression and it helps to hardcode the value. For large data warehouse queries the extra time to parse a query may be worth it if the query is going to run for a long time anyway.
(Caching is unlikely to affect either scenario. Result caching of a statement is rare, it's much more likely that Oracle will cache only the blocks of the tables used in the statement. The buffer cache probably does not care if those blocks are accessed by one statement many times or by many statements one time)

Related

Add limit and offset to query that was created from a String

I have query as String like
select name from employee
and want to limit the number of rows with limit and offset.
Is this possible with jOOQ and how do I do that?
Something like:
dsl.fetch("select name from employee").limit(10).offset(10);

Yes you're close, but you cannot use fetch(sql), because that eagerly executes the query and it will be too late to append LIMIT and OFFSET. I generally don't recommend the approach offered by Sergei Petunin, because that way, you will tell the RDBMS less information about what you're going to do. The execution plan and resource allocations are likely going to be better if you actually use LIMIT and OFFSET.
There are two ways to do what you want to achieve:
Use the parser
You can use DSLContext.parser() to parse your SQL query and then modify the resulting SelectQuery, or create a derived table from that. Creating a derived table is probably a bit cleaner:
dsl.selectFrom(dsl.parser().parse("select name from employee"))
.limit(10)
.offset(10)
.fetch();
The drawback is that the parser will have to understand your SQL string. Some vendor specific features will no longer be available.
The advantage (starting from jOOQ 3.13) is that you will be able to provide your generated code with attached converters and data type bindings this way, as jOOQ will "know" what the columns are.
Use plain SQL
You were already using plain SQL, but the wrong way. Instead of fetching the data eagerly, just wrap your query in DSL.table() and then use the same approach as above.
When using plain SQL, you will have to make sure manually, that the resulting SQL is syntactically correct. This includes wrapping your query in parentheses, and possibly aliasing it, depending on the dialect you're using:
dsl.selectFrom(table("(select name from employee)").as("t"))
.limit(10)
.offset(10)
.fetch();

The best thing you can do with a string query is to create a ResultQuery from it. It allows you to limit the maximum amount of rows fetched by the underlying java.sql.Statement:
create.resultQuery("select name from employee").maxRows(10).fetch();
or to fetch lazily and then scroll through the cursor:
create.resultQuery("select name from employee").fetchLazy().fetch(10);
Adding an offset or a limit to a query is only possible using a SelectQuery, but I don't think there's any way to transform a string query to a SelectQuery in JOOQ.
Actually, if you store SQL queries as strings in the database, then you are already in a non-typesafe area, and might as well append OFFSET x LIMIT y directly to a string-based query. Depending on the complexity of your queries, it might work.

Better to query once, then organize objects based on returned column value, or query twice with different conditions?

I have a table which I need to query, then organize the returned objects into two different lists based on a column value. I can either query the table once, retrieving the column by which I would differentiate the objects and arrange them by looping through the result set, or I can query twice with two different conditions and avoid the sorting process. Which method is generally better practice?
MY_TABLE
NAME AGE TYPE
John 25 A
Sarah 30 B
Rick 22 A
Susan 43 B
Either SELECT * FROM MY_TABLE, then sort in code based on returned types, or
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'A' followed by
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'B'

Logically, a DB query from a Java code will be more expensive than a loop within the code because querying the DB involves several steps such as connecting to DB, creating the SQL query, firing the query and getting the results back.
Besides, something can go wrong between firing the first and second query.
With an optimized single query and looping with the code, you can save a lot of time than firing two queries.
In your case, you can sort in the query itself if it helps:
SELECT * FROM MY_TABLE ORDER BY TYPE
In future if there are more types added to your table, you need not fire an additional query to retrieve it.

It is heavily dependant on the context. If each list is really huge, I would let the database to the hard part of the job with 2 queries. At the opposite, in a web application using a farm of application servers and a central database I would use one single query.
For the general use case, IMHO, I will save database resource because it is a current point of congestion and use only only query.
The only objective argument I can find is that the splitting of the list occurs in memory with a hyper simple algorithm and in a single JVM, where each query requires a bit of initialization and may involve disk access or loading of index pages.

In general, one query performs better.
Also, with issuing two queries you can potentially get inconsistent results (which may be fixed with higher transaction isolation level though ).
In any case I believe you still need to iterate through resultset (either directly or by using framework's methods that return collections).

From the database point of view, you optimally have exactly one statement that fetches exactly everything you need and nothing else. Therefore, your first option is better. But don't generalize that answer in way that makes you query more data than needed. It's a common mistake for beginners to select all rows from a table (no where clause) and do the filtering in code instead of letting the database do its job.

It also depends on your dataset volume, for instance if you have a large data set, doing a select * without any condition might take some time, but if you have an index on your 'TYPE' column, then adding a where clause will reduce the time taken to execute the query. If you are dealing with a small data set, then doing a select * followed with your logic in the java code is a better approach

There are four main bottlenecks involved in querying a database.
The query itself - how long the query takes to execute on the server depends on indexes, table sizes etc.
The data volume of the results - there could be hundreds of columns or huge fields and all this data must be serialised and transported across the network to your client.
The processing of the data - java must walk the query results gathering the data it wants.
Maintaining the query - it takes manpower to maintain queries, simple ones cost little but complex ones can be a nightmare.
By careful consideration it should be possible to work out a balance between all four of these factors - it is unlikely that you will get the right answer without doing so.

You can query by two conditions:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B'
This will do both for you at once, and if you want them sorted, you could do the same, but just add an order by keyword:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B' ORDER BY TYPE ASC
This will sort the results by type, in ascending order.
EDIT:
I didn't notice that originally you wanted two different lists. In that case, you could just do this query, and then find the index where the type changes from 'A' to 'B' and copy the data into two arrays.

JDBC - Best way for perfomance of a preparedstatement with optional criteria

I try to improve the performance of my application. For this, I replace the current statement by prepared-statement to use the oracle cash.
But several of my queries are dynamic (conditional criteria and tables) and I would like to know the best solution to reduce the execution time.
In stackoverflow, I found two solutions.
The first is to build query with if statement :
if(a.equals("Test")) {
whereClause.append ("COLUMN1 = ? ");
}
if(b.equals("Test2")) {
whereClause.append ("COLUMN2 = ? ");
} etc...
The second is to create one query with all criteria but to add "OR var is null"
where (? is null OR (COLUMN1 = ?)) AND (? is null OR (COLUMN2 = ?)) etc.
In your opinion what is the best way In your to performance knowing that a request may contain 40/50 criteria and some tables can be optional too. Or is there another solution.
The database that use is oracle with datasource and jdk 1.5.
Thank you in advance for your help.

First JDBC and Oracle are capable of handling and caching multiple prepared statements.
First approach:
if(a.equals("Test")) {
whereClause.append ("COLUMN1 = ? ");
}
if(b.equals("Test2")) {
whereClause.append ("COLUMN2 = ? ");
}
will certainly lead to more different statements and thous to more prepared statements, but each of this statements is optimizable at preparation-time as no variable is optional.
Second approach:
where (? is null OR (COLUMN1 = ?)) AND (? is null OR (COLUMN2 = ?)) etc.
will result in exactly one prepared statement, but during prepare the query-plan needs to include the possibility that all parameters are non-null.
So it really becomes a matter of benchmarking and how many optional parameters there are. If you have too many prepared statements one might not be cached (but it might also be, that it is so seldomly used, that this does not disturb overall performance).
I also don't know how much overhead the optional parameters in the second approach create even if they are evaluated for each line as the optimizer could not remove them.
Finally if you have indexes prepared that cover some combinations of your optional parameters the first approach should be faster as during prepare the possible indexes are evaluated and the best is choosen (query-plan).

The biggest wins in prepared statement is the query-plan (optimization) being done ahead of time. I, to be honest, doubt the optimization will be very big in case you have 20 criteria like that and only a few are actually used. Why don't you test it and see which is quickest?
Also, I think making a stored procedure out of this could be very advantageous.

I would use PreparedStatement with dynamic query without any additional logic. If a query repeats JDBC will find previously executed cached statement.

I agree with Evgeniy, PreparedStatements are the way to go. If you have many parameters, the upkeep is easier if you use named parameters instead of question marks. See: http://www.javaworld.com/article/2077706/core-java/named-parameters-for-preparedstatement.html. I have used this with about 20 criteria.

SQL optimization options in Java

Let's say I have a basic query like:
SELECT a, b, c FROM x WHERE y=[Z]
In this query, [Z] is a "variable" with different values injected into the query.
Now consider a situation where we want to do the same query with 2 known different values of [Z], say Z1 and Z2. We can make two separate queries:
SELECT a, b, c FROM x WHERE y=Z1
SELECT a, b, c FROM x WHERE y=Z2
Or perhaps we can programmatically craft a different query like:
SELECT a, b, c FROM x WHERE y in (Z1, Z2)
Now we only have one query (1 < 2), but the query construction and result set deconstruction becomes slightly more complicated, since we're no longer doing straightforward simple queries.
Questions:
What is this kind of optimization called? (Is it worth doing?)
How can it be implemented cleanly from a Java application?
Do existing Java ORM technologies help?

What is this kind of optimization called?
I'm not sure if there is a "proper" term for it, but I've heard it called query batching or just plain batching.
(Is it worth doing?)
It depends on:
whether it is worth the effort optimizing the query at all,
the number of elements in the set; i.e. ... IN ( ... ),
the overheads of making a JDBC request versus the costs of query compilation, etc.
But in the right circumstances this is definitely a worthwhile optimization.
How can it be implemented cleanly from a Java application?
It depends on your definition of "clean" :-)
Do existing Java ORM technologies help?
It depends on the specific ORM technology you are talking, but (for example) the Hibernate HQL language supports the constructs that would allow you to do this kind of thing.

An RDBMS can normally return the result of a query with IN in equal or less time than it takes to execute two queries.
If there is no index on column Y, then a full table scan is required. With two queries, two table scans will be performed instead of one.
If there is an index, then the single value in the WHERE clause, or the values in the IN list, are used one at a time to look up the index. When some rows are found for one of the values in the IN list, they are added to the returned result.
So it is better to use the IN predicate from the performance point of view.
When Y represents a column with unique values, then it is easy to decompose the result. Otherwise, there is slightly more work.

I honestly can't say how much of a hit (if any) you will get if you run this two Prepared queries (even using plain JDBC) over combining them with an IN statement.

If you have an array or List of values, you could manually build the prepare statement using JDBC:
// Assuming values is an int[] and conn is a java.sql.Connection
// Also uses Apache Commons StringUtils
StringBuilder query = new StringBuilder("SELECT a, b, c FROM x WHERE y IN (");
query.append(StringUtils.join(Collections.nCopies(values.length, "?"), ',');
query.append(")");
PreparedStatement stmt = conn.prepareStatement(query.toString());
for (int i = 0; i < values.length; i++) {
stmt.setInt(i + 1, values[i]);
}
stmt.execute();
// Get results after this
Note: I haven't actually tested this. In theory, if you used this a lot, you'd generalize this and make it a method.

Note that an "in" (where blah in ( 1, 5, 10 ) ) is the same as writing "where blah = 1 OR blah = 5 OR blah = 10". This is important if you are using, say, Apache Torque which would create lovely prepared statements except in the case of an "in" clause. (That might be fixed by now.)
And the difference in performance that we found between the unprepared in clause and the prepared ORs was huge.
So a number of ORMs handle it, but not all of 'em handle it well. Be sure to examine the queries sent to the database.
And while deconstructing the combined result set from a single query might be more difficult than handling a single result, it's probably a lot easier than trying to combine two result sets from two queries. And probably significantly faster if a lot of duplicates are involved.

Is it possible to use GROUP BY with bind variables?

I want to issue a query like the following
select max(col1), f(:1, col2) from t group by f(:1, col2)
where :1 is a bind variable. Using PreparedStatement, if I say
connection.prepareStatement
("select max(col1), f(?, col2) from t group by f(?, col2)")
I get an error from the DBMS complaining that f(?, col2) is not a GROUP BY expression.
How does one normally solve this in JDBC?

I suggest re-writing the statement so that there is only one bind argument.
This approach is kind of ugly, but returns the result set:
select max(col1)
, f_col2
from (
select col1
, f(? ,col2) as f_col2
from t
)
group
by f_col2
This re-written statement has a reference to only a single bind argument, so now the DBMS sees the expressions in the GROUP BY clause and the SELECT list are identical.
HTH
[EDIT]
(I wish there were a prettier way, this is why I prefer the named bind argument approach that Oracle uses. With the Perl DBI driver, positional arguments are converted to named arguments in the statement actually sent to Oracle.)
I didn't see the problem at first, I didn't understand the original question. (Apparently, several other people missed it too.) But after running some test cases, it dawned on me what the problem was, what the question was working.
Let me see if I can state the problem: how to get two separate (positional) bind arguments to be treated (by the DBMS) as if it were two references to the same (named) bind argument.
The DBMS is expecting the expression in the GROUP BY to match the expression in the SELECT list. But the two expressions are considered DIFFERENT even when the expressions are identical, when the only difference is that each expression references a different bind variable. (We can demonstrate some test cases that at least some DBMS will allow, but there are more general cases that will raise an exception.)
At this point the short answer is, that's got me stumped. The suggestion I have (which may not be an actual answer to the original question) is to restructure the query.
[/EDIT]
I can provide more details if this approach doesn't work, or if you have some other problem figuring it out. Or if there's a problem with performance (I can see the optimizer choosing a different plan for the re-written query, even though it returns the specified result set. For further testing, we'd really need to know what DBMS, what driver, statistics, etc.)
EDIT (eight and a half years later)
Another attempt at a query rewrite. Again, the only solution I come up with is a query with one bind placeholder. This time, we stick it into an inline view that returns a single row, and join that to t. I can see what it's doing; I'm not sure how the Oracle optimizer will see this. We may want (or need) to do an explicit conversion e.g. TO_NUMBER(?) AS param, TO_DATE(?,'...') AS param, TO_CHAR(?) AS param, depending on the datatype of the bind parameter, and the datatype we want to be returned as from the view.)
This is how I would do it in MySQL. The original query in my answer does the join operation inside the inline view (MySQL derived table). And we want to avoid materializing a hughjass derived table if we can avoid it. Then again, MySQL would probably let the original query slide as long as sql_mode doesn't include ONLY_FULL_GROUP_BY. MySQL would also let us drop the FROM DUAL)
SELECT MAX(t.col1)
, f( v.param ,t.col2)
FROM t
CROSS
JOIN ( SELECT ? AS param FROM DUAL) v
GROUP
BY f( v.param ,t.col2)
According to the answer from MadusankaD, within the past eight years, Oracle has added support for reusing the same named bind parameters in the JDBC driver, and retaining equivalence. (I haven't tested that, but if that works now, then great.)

Even though you have issued a query through JDBC driver(using PreparedStatement) like this:
select max(col1), f(:1, col2) from t group by f(:1, col2)
At last JDBC driver replaces these like below query before parsing to the database , even though you have used the same binding variable name in the both places.
select max(col1), f(*:1*, col2) from t group by f(*:2*, col2)
But in oracle this will not be recognized as a valid group by clause.
And also normal JDBC driver doesn't support named bind variables.
For that you can use OraclePreparedStatement class for you connection. That means it is oracle JDBC. Then you can use named bind variables. It will solve your issue.
Starting from Oracle Database 10g JDBC drivers, bind by name is supported using the setXXXAtName methods.
http://docs.oracle.com/cd/E24693_01/java.11203/e16548/apxref.htm#autoId20

Did you try using ? rather than the named bind variables? As well, which driver are you using? I tried this trivial example using the thin driver, and it seemed to work fine:
PreparedStatement ps = con.prepareStatement("SELECT COUNT(*), TO_CHAR(SYSDATE, ?) FROM DUAL GROUP BY TO_CHAR(SYSDATE, ?)");
ps.setString(1, "YYYY");
ps.setString(2, "YYYY");
ps.executeQuery();

In the second case, there are actually two variables - you will need to send them both with the same value.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.