Smart logic queries performance inside functions for PostgreSQL - java

Consider the following sql query:
SELECT a,b,c
FROM t
WHERE (id1 = :p_id1 OR :p_id1 IS NULL) AND (id2 = :p_id2 OR :p_id2 IS NULL)
Markus Winand in his book "SQL Performance explained" names this approach as one of the worst performance anti-patterns of all, and explains why (the database has to prepare plan for the worst case when all filters are disabled).
But later he also writes that for the PostgreSQL this problem occurs only when re-using a statement (PreparedStatement) handle.
Assume also now that query above is wrapped into the function, something like:
CREATE FUNCTION func(IN p_id1 BIGINT,IN p_id2 BIGINT)
...
$BODY$
BEGIN
...
END;
$BODY$
So far I have misunderstanding of few points:
Will this problem still occur in case of function wrapping? (I've tried to see the execution plan for the function call, but Postgres doesn't show me the details for the internal function calls even with SET auto_explain.log_nested_statements = ON).
Let's say I'm working with legacy project and can not change the function itself, only java execution code. Will it be better to avoid prepared statement here and use dynamic query each time? (Assuming that execution time is quite long, up to several seconds). Say this, probably, ugly approach:
getSession().doWork(connection -> {
ResultSet rs = connection.createStatement().executeQuery("select * from func("+id1+","+id2+")");
...
})

1.
It depends.
When not using prepared statements, PostgreSQL plans a query every time anew, using parameters values. It is known as custom plan.
With prepared statements (and you're right, PL/pgSQL functions do use prepared statements) it's more complicated. PostgreSQL prepares the statement (parses its text and stores parse tree), but re-plans it each time it is executed. Custom plans are generated at least 5 times. After that the planner considers using a generic plan (i. e. parameter-value-independent) if it's cost is less than the average cost of custom plans generated so far.
Note, that cost of a plan is an estimation of the planner, not real I/O operations or CPU cycles.
So, the problem can occur, but you need some bad luck for that.
2.
The approach you suggested will not work, because it doesn't change behavior of the function.
In general it is not so ugly for PostgreSQL not to use parameters (as it is for e. g. Oracle), because PostgreSQL doesn't have shared cache for plans. Prepared plans are stored in each backend's memory, so re-planning will not affect other sessions.
But as far as I know, currently there is no way to force planner to use custom plans (other than reconnect after 5 executions...).

Related

Clojure.java.jdbc return type for datetime changes based on query structure

I have a Korma based software stack that constructs fairly complex queries against a MySQL database. I noticed that when I am querying for datetime columns, the type that I get back from the Korma query changes depending on the syntax of the SQL query being generated. I've traced this down to the level of clojure.java.jdbc/query. If the form of the query is like this:
select modified from docs order by modified desc limit 10
then I get back maps corresponding to each database row in which :modified is a java.sql.Timestamp. However, sometimes our query generator generates more complex union queries, such that we need to apply an order by ... limit ... constraint to the final result of the union. Korma does this by wrapping the query in parentheses. Even with only a single subquery--i.e., a simple parenthesized select--so long as we add an "outer" order by ..., the type of :modified changes.
(select modified from docs order by modified desc limit 10) order by modified desc
In this case, clojure.java.jdbc/query returns :modified values as strings. Some of our higher level code isn't expecting this, and gets exceptions.
We're using a fork of Korma, which is using an old (0.3.7) version of clojure.java.jdbc. I can't tell if the culprit is clojure.java.jdbc or java.jdbc or MySQL. Anyone seen this and have ideas on how to fix it?
Moving to the latest jdbc in a similar situation changed several other things for us and was a decidedly "non-trvial" task. I would suggest getting off of a korma fork soon and then debugging this.
For us the changes focused around what korma returned on update calls changed between the verions of the backing jdbc. It was well worth getting current even though it's a moderately painful process.
Getting current with jdbc will give you fresh new problems!
best of luck with this :-) These things tend to be fairly specific to the DB server you are using.
Other options for you is to have a policy of aways specifying an order-by parameter or building a library to coerce the strings into dates. Both of these have some long term technical dept problems.

Closing a PreparedStatement after a single execute – is it a design flaw?

I have looked into various places, and have heard a lot of dubious claims, ranging from PreparedStatement should be preferred over Statement everywhere, even if only for the performance benefit; all the way to claims that PreparedStatements should be used exclusively for batched statements and nothing else.
However, there seems to be a blind spot in (primarily online) discussions I have followed. Let me present a concrete scenario.
We have an EDA-designed application with a DB connection pool. Events come, some of them require persistence, some do not. Some are artificially generated (e.g. update/reset something every X minutes, for example).
Some events come and are handled sequentially, but other types of events (also requiring persistence) can (and will) be handled concurrently.
Aside from those artificially generated events, there is no structure in how events requiring persistence arrive.
This application was designed quite a while ago (roughly 2005) and supports several DBMSes. The typical event handler (where persistence is required):
get connection from pool
prepare sql statement
execute prepared statement
process the result set, if applicable, close it
close prepared statement
prepare a different statement, if necessary and handle the same way
return connection to pool
If an event requires batch processing, the statement is prepared once and addBatch/executeBatch methods are used. This is an obvious performance benefit and these cases are not related to this question.
Recently, I have received an opinion, that the whole idea of preparing (parsing) a statement, executing it once and closing is essentially a misuse of PreparedStatement, provides zero performance benefits, regardless of whether server or client prepared statements are used and that typical DBMSes (Oracle, DB2, MSSQL, MySQL, Derby, etc.) will not even promote such a statement to prepared statement cache (or at least, their default JDBC driver/datasource will not).
Moreover, I had to test certain scenarios in dev environment on MySQL, and it seems that the Connector/J usage analyzer agrees with this idea. For all non-batched prepared statements, calling close() prints:
PreparedStatement created, but used 1 or fewer times. It is more efficient to prepare statements once, and re-use them many times
Due to application design choices outlined earlier, having a PreparedStatement instance cache that holds every single SQL statement used by any event for each connection in the connection pool sounds like a poor choice.
Could someone elaborate further on this? Is the logic "prepare-execute (once)-close" flawed and essentially discouraged?
P.S. Explicitly specifying useUsageAdvisor=true and cachePrepStmts=true for Connector/J and using either useServerPrepStmts=true or useServerPrepStmts=false still results in warnings about efficiency when calling close() on PreparedStatement instances for every non-batched SQL statement.
Is the logic prepare-execute [once]-close flawed and essentially discouraged?
I don't see that as being a problem, per se. A given SQL statement needs to be "prepared" at some point, whether explicitly (with a PreparedStatement) or "on the fly" (with a Statement). There may be a tiny bit more overhead incurred if we use a PreparedStatement instead of a Statement for something that will only be executed once, but it is unlikely that the overhead involved would be significant, especially if the statement you cite is true:
typical DBMSes (Oracle, DB2, MSSQL, MySQL, Derby, etc.) will not even promote such a statement to prepared statement cache (or at least, their default JDBC driver/datasource will not).
What is discouraged is a pattern like this:
for (int thing : thingList) {
PreparedStatement ps = conn.prepareStatement(" {some constant SQL statement} ");
ps.setInt(1, thing);
ps.executeUpdate();
ps.close();
}
because the PreparedStatement is only used once and the same SQL statement is being prepared over and over again. (Although even that might not be such a big deal if the SQL statement and its executation plan are indeed cached.) The better way to do that is
PreparedStatement ps = conn.prepareStatement(" {some constant SQL statement} ");
for (int thing : thingList) {
ps.setInt(1, thing);
ps.executeUpdate();
}
ps.close();
... or even better, with a "try with resources" ...
try (PreparedStatement ps = conn.prepareStatement(" {some constant SQL statement} ")) {
for (int thing : thingList) {
ps.setInt(1, thing);
ps.executeUpdate();
}
}
Note that this is true even without using batch processing. The SQL statement is still only prepared once and used several times.
As others already stated, the most expensive part is the parsing the statement in the database. Some database systems (this is pretty much DB dependent – I will speak mainly for Oracle) may profit, if the statement is already parsed in the shared pool. (In Oracle terminology this is called a soft parse that is cheaper than a hard parse - a parse of a new statement). You can profit from soft parse even if you use the prepared statement only once.
So the important task is to give the database a chance to reuse the statement. A typical counter example is the handling of the IN list based on a collection in Hibernate. You end with the statement such as
.. FROM T WHERE X in (?,?,?, … length based on the size of the collection,?,? ,?,?)
You can’t reuse this statement if the size of the collection differ.
A good starting point to get overview of the spectrum of the SQL queries produced by a running application is (by Oracle) the V$SQL view. Filter the PARSING_SCHEMA_NAME with you connection pool user and check the SQL_TEXT and the EXECUTIONS count.
Two extreme situation should be avoided:
Passing parameters (IDs) in the query text (this is well known) and
Reusing statement for different access paths.
An example of the latter is a query that with a provided parameter performs an index access to a limited part of the table, while without the parameter all records should be processed (full table scan). In that case is definitively no problem to create two different statements (as the parsing of both leads to different execution plans).
PreparedStatements are preferable because one is needed regardless of whether you create one programmatically or not; internally the database creates one every time a query is run - creating one programatically just gives you a handle to it. Creating and throwing away a PreparedStatement every time doesn't add much overhead over using Statement.
A large effort is required by the database to create one (syntax checking, parsing, permissions checking, optimization, access strategy, etc). Reusing one bypasses this effort for subsequent executions.
Instead of throwing them away, try either writing the query in such a way that it can be reused, eg by ignoring null input parameters:
where someCol = coalesce(?, someCol)
so if you set the parameter to null (ie "unspecified), the condition succeeds)
or if you absolutely must build the query every time, keep references to the PreparedStatements in a Map where the built query is the key and reuse them if you get a hit. Use a WeakHashMap<String, PreparedStatements> for you map implementation to prevent running out of memory.
PreparedStatement created, but used 1 or fewer times. It is more efficient to prepare statements once, and re-use them many times
I thing you may safely ignore this warning, it is similar to a claim It is more efficient to work first 40 hour in the week, than sleep next 56 hours, eat following 7 hours and the rest is your free time.
You need exactly one execution per event - should you perform 50 to get a better average?
SQL commands that run only once, in terms of performance, just waste database resources (memory, processing) being sent in a Prepared Statement. In other hand, not using Prepared Statement let app vulnerable to SQL injection.
Are security (protection from SQL injection) working against performance (prepared statement that runs just once) ? Yes, but...
But it should not be that way. It's a choice java does NOT implement an interface to let developers call the right database API: SQL commands that run just once AND are properly protected against SQL injection ! Why Java just not implement the correct tool for this specific task?
It could be as follows:
Statement Interface - Different SQL commands could be submitted. One execution of SQL commands. Bind variables not allowed.
PreparedStatement Interface - One SQL command could be submitted. Multiple executions of SQL command. Bind variables allowed.
(MISSING IN JAVA!) RunOnceStatement - One SQL command could be submitted. One execution of SQL command. Bind variables allowed.
For example, the correct routine (API) could be called in Postgres, by driver mapping to:
- Statement Interface - call PQExec()
- PreparedStatement Interface - call PQPrepare() / PQExecPrepare() / ...
- (MISSING IN JAVA!) RunOnceStatement Interface - call PQExecParams()
Using prepared statement in SQL code that runs just once is a BIG performance problem: more processing in database, waste database memory, by maintaining plans that will not called later. Cache plans get so crowed that actual SQL commands that are executed multiple times could be deleted from cache.
But Java does not implement the correct interface, and forces everybody to use Prepared Statement everywhere, just to protect against SQL injection...

Better option to fetch results from database tables

Are there any performance improvement in calling a procedure which returns SYS_RECURSOR or call a query?
For example
CREATE OR REPLACE PROCEDURE my_proc
(
p_id number,
emp_cursor IN OUT SYS_REFCURSOR
)
AS
BEGIN
OPEN emp_cursor for
select * from emp where emp_number=p_id
end;
/
and call the above from Java by registering OUT parameter,pass IN parameter and fetch the results.
Or
From Java get the results from emp table by
preparedStatement = prepareStatement(connection, "select * from emp where emp_number=?", values);
resultSet = preparedStatement.executeQuery();
Which one of the above is a better option to call from Java?
There is no performance difference assuming your prepareStatement method is using the appropriate type for all bind variables. That is, you would need to ensure that you are calling setLong, setDate, setString, etc. depending on the data type of the parameter. If you bind the data incorrectly (i.e. calling setString to bind a numeric value), you may force Oracle to do data type conversion which may prevent the optimizer from using an index that would improve performance.
From a code organization and maintenance standpoint, however, I would rather have the queries in the database rather than in the Java application. If you find that a query is using a poor plan, for example, it's likely to be much easier for a DBA to address the problem if the query is in a stored procedure than if the query is embedded in a Java application. If the query is stored in the database, you can also use the database's dependency tracking functions to more easily do an impact analysis if you need to do something like determine what would be impacted if the emp table needs to change.
Well, I don't think there is major significant difference from the Java invocation standpoint.
Some differencesI can think of are:
You will now have to maintain two different code bases: your Java code and your stored procedures. In case of errors, you will have to debug in two different places, and fix problems in two different places.
Once production-ready, making changes to the database is probably going to require some additional formalisms besides those required to change the Java code deployed.
Another important matter to take into account is database-independence, if you are building a product to work with different kinds of databases, you would be forced to write different versions of your stored procedures and you will have more code to maintain (debug, bugfix, change, etc).
This very important if you're building a product that you intend to deploy in different environments of different (possible yet unknown) clients, wich you cannot predict what RDBMS will be using.
If you want to use an ORM framework i.e. Hibernate, EclipseLink) it will generate pretty optimized queries for you. Plus, it would be more difficult to integrate it later on if you use stored-procedures.
With proper amount of logging is easy to analyze your queries for optimization purposes. You could use JDBC logging or the logging provided by your ORM provider and actually see how the query is being used by the application, how many times, how often, etc, and optimize where it matters.

which is faster? Statement or PreparedStatement

Often, in the network can be found code like this:
private static final String SQL = "SELECT * FROM table_name";
....
and for this SQL query is used PreparedStatement. Why?
As i know, PreparedStatement spend time to precompile SQL statement. It turns out so that the Statement is faster than a PreparedStatement. Or I'm mistaken?
Prepared statements are much faster when you have to run the same statement multiple times, with different data. Thats because SQL will validate the query only once, whereas if you just use a statement it will validate the query each time.
The other benefit of using PreparedStatements is to avoid causing a SQL injection vulnerability - though in your case your query is so simple you haven't encountered that.
For your query, the difference between running a prepared statement vs a statement is probably negligible.
EDIT: In response to your comment below, you will need to look closely at the DAO class to see what it is doing. If for example, each time the method is called it re-creates the prepared statement then you will lose any benefit of using prepared statements.
What you want to achieve, is the encapsulation of your persistence layer so that their is no specific call to MySQL or Postgres or whatever you are using, and at the same time take advantage of the performance and security benefits of things like prepared statements. To do this you need to rely on Java's own objects such as PreparedStatement,.
I personally would build my own DAO class for doing CRUD operations, using Hibernate underneath and the Java Persistence API to encapsulate it all, and that should use prepared statements for the security benefits. If you have a specific use-case for doing repeated operations, then I would be inclined to wrap that within its own object.
Hibernate can be configured to use whatever database vendor you are using via an XML file, and thus it provides really neat encapsulation of your persistence layer. However, it is quite a complicated product to get right!
Most of the time queries are not as simple as your example. If there is any variation to the query, i.e. any parameters that are not known at compile time, you must use PreparedStatement to avoid SQL injection vulnerabilities. This trumps any performance concerns.
If there is any difference between PreparedStatement and Statement, it would be highly dependent on the particular JDBC driver in question and most of the time the penalty will be negligible compare to the cost of going to the database, executing actual query and fetching results back.
As Per the My knowledge PreparedStatement is much faster then statement. Here some reason why preparedstatement is faster then statement please read for more detail.
JDBC API is provide the functionality of connectivity with database. Then we try to execute the query with the use of statement and preparedstatement.
There are four step to execute the query.
Parsing of sql query.
Compile this Query.
optimization of data acquisition path.
execute the query.
Statement interface is suitable when we will not need to execute the query multiple time.
Disadvantages of Statement Interface.
hacker can easily to hack the data. Like suppose we have one query which have the username and password is a parameters you can give the proper parameters is username='abc#example.com' and password ='abc123' actually this is current But hacker can do username='abc#example.com' or '1'=1 and password='' that means you can logged successfully. so that is happening possible in Statement.
And sql validate every time when we fetch the data from database.
So Java has the solution for this above problem that is PreparedStatement.
This interface has many advantages. the main advantages of preparedstatement is sql is not validate the query every time. so you can get the result fast. please read the below more advantages of preparedstatement.
1) We can safely provide the value of query's parameters with setter method.
2) it prevent the SQL injection because it is automatically escapes the special characters.
3) When we use statement above four steps are execute every time But when we use the PreparedStatement only last steps is execute so this is faster then statement.
Faster is not the consideration here. Parsing of the sql will generally be a tiny part of overall execution. See more at When should we use a PreparedStatement instead of a Statement?

Is "LIKE ?" More efficient than LIKE '%'||?||'%'

Recently one of my colleagues made a comment that I should not use
LIKE '%'||?||'%'
rather use
LIKE ?
in the SQL and then replace the LIKE ? marker with LIKE '%'||?||'%' before I execute the SQL. He made the point that with a single parameter marker DB2 database will cache the statement always and thus cut down on the SQL prepare time.
However, I am not sure if it is accurate or not. To me it should be the other way around since we are doing more processing by doing a string replace on the SQL everytime the query is getting executed.
Does anyone know if a single marker really speeds up execution? Just FYI - I am using Spring 2.5 JDBC framework and the DB2 version is 9.2.
My question is - does DB2 treat "LIKE ?" differently from "LIKE '%'||?||'%'" as far as caching and preparation goes.
'LIKE ?' is a PreparedStatement. Prepared statements are an optimization at the JDBC driver level. The thinking is that databases analyze queries to decide how to most efficiently process them. The DB can then cache the resulting query plan, keyed on the full statement. Reusing identical statements reuses the query plan. So basically if you are running the same query multiple times with different comparison strings, and if the query plan stays cached, then yes, using 'LIKE ?' will be faster.
Some useful (though somewhat dated) info on PreparedStatements:
Prepared Statments
More Prepared Statments
I haven't done too much DB2, not since the 90's and I'm not really sure if I'm understanding what your underlying question is. Way back then I got a phone call from the head of the DBA team. "What are you doing different than every other programmer we've got!??" Mind you, this was early in my career, so tentatively I answered, "Nothing....", imagine it in kind of a whiny voice. "Well then, why do your queries take 50% of the cpu resources of any the other guys???". I took a quick poll of all the other guys and found I was the only one using prepared statements. Now under the covers Spring automatically makes prepared statements, and they've improved statement caching in the database a lot over the years, but if you make use of the properly, you can get the speedup there, AND it'll make the statement cache swap things out less often. It really depends on your use case, if you're only going to hit the query once, then there would be no difference, if its a few thousand times, obviously it would make a much greater difference.
in the SQL and then replace the LIKE ? marker with LIKE '%'||?||'%' before I execute the SQL. He made the point that with a single parameter marker DB2 database will cache the statement always and thus cut down on the SQL prepare time.
Unless DB2 is some sort of weird alien SQL database, or if it's driver does some crazy things, then the database server will never see your prepared statement until you actually execute it. So you can swap clauses in and out of the PreparedStatement all day long, and it will have no effect until you actually send it to the server when you execute it.

Categories