I have looked into various places, and have heard a lot of dubious claims, ranging from PreparedStatement should be preferred over Statement everywhere, even if only for the performance benefit; all the way to claims that PreparedStatements should be used exclusively for batched statements and nothing else.
However, there seems to be a blind spot in (primarily online) discussions I have followed. Let me present a concrete scenario.
We have an EDA-designed application with a DB connection pool. Events come, some of them require persistence, some do not. Some are artificially generated (e.g. update/reset something every X minutes, for example).
Some events come and are handled sequentially, but other types of events (also requiring persistence) can (and will) be handled concurrently.
Aside from those artificially generated events, there is no structure in how events requiring persistence arrive.
This application was designed quite a while ago (roughly 2005) and supports several DBMSes. The typical event handler (where persistence is required):
get connection from pool
prepare sql statement
execute prepared statement
process the result set, if applicable, close it
close prepared statement
prepare a different statement, if necessary and handle the same way
return connection to pool
If an event requires batch processing, the statement is prepared once and addBatch/executeBatch methods are used. This is an obvious performance benefit and these cases are not related to this question.
Recently, I have received an opinion, that the whole idea of preparing (parsing) a statement, executing it once and closing is essentially a misuse of PreparedStatement, provides zero performance benefits, regardless of whether server or client prepared statements are used and that typical DBMSes (Oracle, DB2, MSSQL, MySQL, Derby, etc.) will not even promote such a statement to prepared statement cache (or at least, their default JDBC driver/datasource will not).
Moreover, I had to test certain scenarios in dev environment on MySQL, and it seems that the Connector/J usage analyzer agrees with this idea. For all non-batched prepared statements, calling close() prints:
PreparedStatement created, but used 1 or fewer times. It is more efficient to prepare statements once, and re-use them many times
Due to application design choices outlined earlier, having a PreparedStatement instance cache that holds every single SQL statement used by any event for each connection in the connection pool sounds like a poor choice.
Could someone elaborate further on this? Is the logic "prepare-execute (once)-close" flawed and essentially discouraged?
P.S. Explicitly specifying useUsageAdvisor=true and cachePrepStmts=true for Connector/J and using either useServerPrepStmts=true or useServerPrepStmts=false still results in warnings about efficiency when calling close() on PreparedStatement instances for every non-batched SQL statement.
Is the logic prepare-execute [once]-close flawed and essentially discouraged?
I don't see that as being a problem, per se. A given SQL statement needs to be "prepared" at some point, whether explicitly (with a PreparedStatement) or "on the fly" (with a Statement). There may be a tiny bit more overhead incurred if we use a PreparedStatement instead of a Statement for something that will only be executed once, but it is unlikely that the overhead involved would be significant, especially if the statement you cite is true:
typical DBMSes (Oracle, DB2, MSSQL, MySQL, Derby, etc.) will not even promote such a statement to prepared statement cache (or at least, their default JDBC driver/datasource will not).
What is discouraged is a pattern like this:
for (int thing : thingList) {
PreparedStatement ps = conn.prepareStatement(" {some constant SQL statement} ");
ps.setInt(1, thing);
ps.executeUpdate();
ps.close();
}
because the PreparedStatement is only used once and the same SQL statement is being prepared over and over again. (Although even that might not be such a big deal if the SQL statement and its executation plan are indeed cached.) The better way to do that is
PreparedStatement ps = conn.prepareStatement(" {some constant SQL statement} ");
for (int thing : thingList) {
ps.setInt(1, thing);
ps.executeUpdate();
}
ps.close();
... or even better, with a "try with resources" ...
try (PreparedStatement ps = conn.prepareStatement(" {some constant SQL statement} ")) {
for (int thing : thingList) {
ps.setInt(1, thing);
ps.executeUpdate();
}
}
Note that this is true even without using batch processing. The SQL statement is still only prepared once and used several times.
As others already stated, the most expensive part is the parsing the statement in the database. Some database systems (this is pretty much DB dependent – I will speak mainly for Oracle) may profit, if the statement is already parsed in the shared pool. (In Oracle terminology this is called a soft parse that is cheaper than a hard parse - a parse of a new statement). You can profit from soft parse even if you use the prepared statement only once.
So the important task is to give the database a chance to reuse the statement. A typical counter example is the handling of the IN list based on a collection in Hibernate. You end with the statement such as
.. FROM T WHERE X in (?,?,?, … length based on the size of the collection,?,? ,?,?)
You can’t reuse this statement if the size of the collection differ.
A good starting point to get overview of the spectrum of the SQL queries produced by a running application is (by Oracle) the V$SQL view. Filter the PARSING_SCHEMA_NAME with you connection pool user and check the SQL_TEXT and the EXECUTIONS count.
Two extreme situation should be avoided:
Passing parameters (IDs) in the query text (this is well known) and
Reusing statement for different access paths.
An example of the latter is a query that with a provided parameter performs an index access to a limited part of the table, while without the parameter all records should be processed (full table scan). In that case is definitively no problem to create two different statements (as the parsing of both leads to different execution plans).
PreparedStatements are preferable because one is needed regardless of whether you create one programmatically or not; internally the database creates one every time a query is run - creating one programatically just gives you a handle to it. Creating and throwing away a PreparedStatement every time doesn't add much overhead over using Statement.
A large effort is required by the database to create one (syntax checking, parsing, permissions checking, optimization, access strategy, etc). Reusing one bypasses this effort for subsequent executions.
Instead of throwing them away, try either writing the query in such a way that it can be reused, eg by ignoring null input parameters:
where someCol = coalesce(?, someCol)
so if you set the parameter to null (ie "unspecified), the condition succeeds)
or if you absolutely must build the query every time, keep references to the PreparedStatements in a Map where the built query is the key and reuse them if you get a hit. Use a WeakHashMap<String, PreparedStatements> for you map implementation to prevent running out of memory.
PreparedStatement created, but used 1 or fewer times. It is more efficient to prepare statements once, and re-use them many times
I thing you may safely ignore this warning, it is similar to a claim It is more efficient to work first 40 hour in the week, than sleep next 56 hours, eat following 7 hours and the rest is your free time.
You need exactly one execution per event - should you perform 50 to get a better average?
SQL commands that run only once, in terms of performance, just waste database resources (memory, processing) being sent in a Prepared Statement. In other hand, not using Prepared Statement let app vulnerable to SQL injection.
Are security (protection from SQL injection) working against performance (prepared statement that runs just once) ? Yes, but...
But it should not be that way. It's a choice java does NOT implement an interface to let developers call the right database API: SQL commands that run just once AND are properly protected against SQL injection ! Why Java just not implement the correct tool for this specific task?
It could be as follows:
Statement Interface - Different SQL commands could be submitted. One execution of SQL commands. Bind variables not allowed.
PreparedStatement Interface - One SQL command could be submitted. Multiple executions of SQL command. Bind variables allowed.
(MISSING IN JAVA!) RunOnceStatement - One SQL command could be submitted. One execution of SQL command. Bind variables allowed.
For example, the correct routine (API) could be called in Postgres, by driver mapping to:
- Statement Interface - call PQExec()
- PreparedStatement Interface - call PQPrepare() / PQExecPrepare() / ...
- (MISSING IN JAVA!) RunOnceStatement Interface - call PQExecParams()
Using prepared statement in SQL code that runs just once is a BIG performance problem: more processing in database, waste database memory, by maintaining plans that will not called later. Cache plans get so crowed that actual SQL commands that are executed multiple times could be deleted from cache.
But Java does not implement the correct interface, and forces everybody to use Prepared Statement everywhere, just to protect against SQL injection...
Related
Consider the following sql query:
SELECT a,b,c
FROM t
WHERE (id1 = :p_id1 OR :p_id1 IS NULL) AND (id2 = :p_id2 OR :p_id2 IS NULL)
Markus Winand in his book "SQL Performance explained" names this approach as one of the worst performance anti-patterns of all, and explains why (the database has to prepare plan for the worst case when all filters are disabled).
But later he also writes that for the PostgreSQL this problem occurs only when re-using a statement (PreparedStatement) handle.
Assume also now that query above is wrapped into the function, something like:
CREATE FUNCTION func(IN p_id1 BIGINT,IN p_id2 BIGINT)
...
$BODY$
BEGIN
...
END;
$BODY$
So far I have misunderstanding of few points:
Will this problem still occur in case of function wrapping? (I've tried to see the execution plan for the function call, but Postgres doesn't show me the details for the internal function calls even with SET auto_explain.log_nested_statements = ON).
Let's say I'm working with legacy project and can not change the function itself, only java execution code. Will it be better to avoid prepared statement here and use dynamic query each time? (Assuming that execution time is quite long, up to several seconds). Say this, probably, ugly approach:
getSession().doWork(connection -> {
ResultSet rs = connection.createStatement().executeQuery("select * from func("+id1+","+id2+")");
...
})
1.
It depends.
When not using prepared statements, PostgreSQL plans a query every time anew, using parameters values. It is known as custom plan.
With prepared statements (and you're right, PL/pgSQL functions do use prepared statements) it's more complicated. PostgreSQL prepares the statement (parses its text and stores parse tree), but re-plans it each time it is executed. Custom plans are generated at least 5 times. After that the planner considers using a generic plan (i. e. parameter-value-independent) if it's cost is less than the average cost of custom plans generated so far.
Note, that cost of a plan is an estimation of the planner, not real I/O operations or CPU cycles.
So, the problem can occur, but you need some bad luck for that.
2.
The approach you suggested will not work, because it doesn't change behavior of the function.
In general it is not so ugly for PostgreSQL not to use parameters (as it is for e. g. Oracle), because PostgreSQL doesn't have shared cache for plans. Prepared plans are stored in each backend's memory, so re-planning will not affect other sessions.
But as far as I know, currently there is no way to force planner to use custom plans (other than reconnect after 5 executions...).
Is it good practice to use Prepared statement for SELECT SQL with connection pooling. (In my case I use Tomcat JDBC connection pool).
Does it add any advantage(speed ups) or it will add overhead for maintaining the Prepared Statements, connections and keep them alive or track whether closed as Pooled connections are maintained internally and they get closed according to different settings as specified here.
I am using DataSource to get connection, Database is MariaDB.
While reading various posts, documentations and examples most of Prepared Statement have been built using INSERT or UPDATE queries. Does it points that for SELECT it will not add any advantage?
MariaDB/MySQL prepared statements do not have any advantages when it comes to query parsing / optimizing, the query plan is not preserved as on some other SQL databases.
They do have a performance advantage when it comes to transferring result sets as column values can be transfered in binary form and get stored into result variables right away. With classic non-prepared statements all result fields are converted to textual form on the server side. This adds processing time on the server side, leads to more bytes having to be transfered over the wire, and depending on your application needs the client side may need to convert values back from textual to binary form (e.g. for integer and float values).
The other reason for using prepared statements, as also noted in the previous comments, is that it is a reliable way to prevent SQL injection, and that applies to SELECT as well as INSERT/UPDATE/DELETE
It's good to use PreparedStatement if you can:
Prevent SQLInjection
Abstract Date/Time representation
Deal with Charset conversions
Readability (you see one string with full SQL)
As the SQL stays constant (with ?) the database might cache the plan and doesn't have to reparse
In case of SELECT the main focus of cause lies with the parameters passed into the WHERE condition.
As for performance: This may depend - but I've never experienced PreparedStatements to be significantly worse than simple Statements - if coded correct of cause.
The fact that you're pooling connections doesn't add much to this. The concept of somehow "preparing all the statements you're going to need on that connection for later" is not how PreparedStatments are meant to be used. It's perfectly fine to prepare the same tiny Statement over and over and over - altough if faced with a loop of INSERTs or UPDATEs it would be wise to reuse PreparedStatement and/or batch the INSERTs
Often, in the network can be found code like this:
private static final String SQL = "SELECT * FROM table_name";
....
and for this SQL query is used PreparedStatement. Why?
As i know, PreparedStatement spend time to precompile SQL statement. It turns out so that the Statement is faster than a PreparedStatement. Or I'm mistaken?
Prepared statements are much faster when you have to run the same statement multiple times, with different data. Thats because SQL will validate the query only once, whereas if you just use a statement it will validate the query each time.
The other benefit of using PreparedStatements is to avoid causing a SQL injection vulnerability - though in your case your query is so simple you haven't encountered that.
For your query, the difference between running a prepared statement vs a statement is probably negligible.
EDIT: In response to your comment below, you will need to look closely at the DAO class to see what it is doing. If for example, each time the method is called it re-creates the prepared statement then you will lose any benefit of using prepared statements.
What you want to achieve, is the encapsulation of your persistence layer so that their is no specific call to MySQL or Postgres or whatever you are using, and at the same time take advantage of the performance and security benefits of things like prepared statements. To do this you need to rely on Java's own objects such as PreparedStatement,.
I personally would build my own DAO class for doing CRUD operations, using Hibernate underneath and the Java Persistence API to encapsulate it all, and that should use prepared statements for the security benefits. If you have a specific use-case for doing repeated operations, then I would be inclined to wrap that within its own object.
Hibernate can be configured to use whatever database vendor you are using via an XML file, and thus it provides really neat encapsulation of your persistence layer. However, it is quite a complicated product to get right!
Most of the time queries are not as simple as your example. If there is any variation to the query, i.e. any parameters that are not known at compile time, you must use PreparedStatement to avoid SQL injection vulnerabilities. This trumps any performance concerns.
If there is any difference between PreparedStatement and Statement, it would be highly dependent on the particular JDBC driver in question and most of the time the penalty will be negligible compare to the cost of going to the database, executing actual query and fetching results back.
As Per the My knowledge PreparedStatement is much faster then statement. Here some reason why preparedstatement is faster then statement please read for more detail.
JDBC API is provide the functionality of connectivity with database. Then we try to execute the query with the use of statement and preparedstatement.
There are four step to execute the query.
Parsing of sql query.
Compile this Query.
optimization of data acquisition path.
execute the query.
Statement interface is suitable when we will not need to execute the query multiple time.
Disadvantages of Statement Interface.
hacker can easily to hack the data. Like suppose we have one query which have the username and password is a parameters you can give the proper parameters is username='abc#example.com' and password ='abc123' actually this is current But hacker can do username='abc#example.com' or '1'=1 and password='' that means you can logged successfully. so that is happening possible in Statement.
And sql validate every time when we fetch the data from database.
So Java has the solution for this above problem that is PreparedStatement.
This interface has many advantages. the main advantages of preparedstatement is sql is not validate the query every time. so you can get the result fast. please read the below more advantages of preparedstatement.
1) We can safely provide the value of query's parameters with setter method.
2) it prevent the SQL injection because it is automatically escapes the special characters.
3) When we use statement above four steps are execute every time But when we use the PreparedStatement only last steps is execute so this is faster then statement.
Faster is not the consideration here. Parsing of the sql will generally be a tiny part of overall execution. See more at When should we use a PreparedStatement instead of a Statement?
When to use statement instead of prepared statement. i suppose statement is used in queries with no parameter but why not use prepared statement ? Which one is faster for queries with no params.
I suppose statement is used in queries with no parameter but why not use prepared statement ?
That's not even close. PreparedStatements are used in the case of INSERT, UPDATE and DELETE statements that return a ResultSet or an update count. They will not work for DDL statements as pointed out by Joachim, and neither will they work for invocation of stored procedures where a CallableStatement ought to be used (this is not a difference between the two classes). As far as queries with no bind parameters are concerned, PreparedStatements can turn out to be better than Statements (see below).
Which one is faster for queries with no params.
PreparedStatements will turn out to be faster in the long run, over extended use in a single connection. This is because, although PreparedStatements have to be compiled, which would take some time (this really isn't a lot, so don't see this as a drawback), the compiled version essentially holds a reference to the SQL execution plan in the database. Once compiled, the PreparedStatement is stored in a connection specific cache, so that the compiled version may be reused to achieve performance gains. If you are using JDBC Batch operations, using PreparedStatements will make the execution of the batch much faster than the use of plain Statement objects, where the plan may have to be prepared time and again, if the database has to do so.
That's depending on Your requirement.
If you have a SQL statement which runs in a loop or frequently with different parameters then PreparedStatement is the best candidate since it is getting pre-compiled and cache the execution plan for this parameterized SQL query. Each time it runs from the same PreparedStatement object it will use cached execution plan and gives the better performance.
Also SQL injection can be avoided using PreparedStatement .
But if you are sure that you run SQL query only once, sometimes Statement will be the best candidate since when you create PreparedStatement object sometimes it make additional db call, if the driver supports precompilation, the method Connection.prepareStatement(java.lang.String) will send the statement to the database for precompilation.
Read below article to understand "Statement Versus PreparedStatement"
Java Programming with Oracle JDBC
Recently one of my colleagues made a comment that I should not use
LIKE '%'||?||'%'
rather use
LIKE ?
in the SQL and then replace the LIKE ? marker with LIKE '%'||?||'%' before I execute the SQL. He made the point that with a single parameter marker DB2 database will cache the statement always and thus cut down on the SQL prepare time.
However, I am not sure if it is accurate or not. To me it should be the other way around since we are doing more processing by doing a string replace on the SQL everytime the query is getting executed.
Does anyone know if a single marker really speeds up execution? Just FYI - I am using Spring 2.5 JDBC framework and the DB2 version is 9.2.
My question is - does DB2 treat "LIKE ?" differently from "LIKE '%'||?||'%'" as far as caching and preparation goes.
'LIKE ?' is a PreparedStatement. Prepared statements are an optimization at the JDBC driver level. The thinking is that databases analyze queries to decide how to most efficiently process them. The DB can then cache the resulting query plan, keyed on the full statement. Reusing identical statements reuses the query plan. So basically if you are running the same query multiple times with different comparison strings, and if the query plan stays cached, then yes, using 'LIKE ?' will be faster.
Some useful (though somewhat dated) info on PreparedStatements:
Prepared Statments
More Prepared Statments
I haven't done too much DB2, not since the 90's and I'm not really sure if I'm understanding what your underlying question is. Way back then I got a phone call from the head of the DBA team. "What are you doing different than every other programmer we've got!??" Mind you, this was early in my career, so tentatively I answered, "Nothing....", imagine it in kind of a whiny voice. "Well then, why do your queries take 50% of the cpu resources of any the other guys???". I took a quick poll of all the other guys and found I was the only one using prepared statements. Now under the covers Spring automatically makes prepared statements, and they've improved statement caching in the database a lot over the years, but if you make use of the properly, you can get the speedup there, AND it'll make the statement cache swap things out less often. It really depends on your use case, if you're only going to hit the query once, then there would be no difference, if its a few thousand times, obviously it would make a much greater difference.
in the SQL and then replace the LIKE ? marker with LIKE '%'||?||'%' before I execute the SQL. He made the point that with a single parameter marker DB2 database will cache the statement always and thus cut down on the SQL prepare time.
Unless DB2 is some sort of weird alien SQL database, or if it's driver does some crazy things, then the database server will never see your prepared statement until you actually execute it. So you can swap clauses in and out of the PreparedStatement all day long, and it will have no effect until you actually send it to the server when you execute it.