A few days ago I had to create some processing performance tests using an in memory computing framework. So in order to do this, I needed a big data pool, which was increased incrementally given the various performance tests.
The DB was Oracle, containing a table of 22 fields. This table needed to be populated gradually from 1 mil records to 100 mil records.
For populating the table with 1 mil, I generated random test data and used java Statement to insert it into DB, and that has taken around 17 and 16 seconds minutes. After that, I quickly realized that to populate a 100 mil records table will take forever so I tried it with PreparedStatement because I knew that is a bit faster… but the difference was so immense, 1 min and 24 seconds, that I have started to search the web the reason behind this and found out some reasons but nothing that, in my opinion, should have this impact.
this is what I have found that might explain this difference:
LINK
PreparedStatement gets pre compiled
In database and there access plan is also cached in database, which allows database to execute parametric query written using prepared statement much faster than normal query because it has less work to do. You should always try to use PreparedStatement in production JDBC code to reduce load on database. In order to get performance benefit its worth noting to use only parametrized version of sql query and not with string concatenation.
BUT
all the data was generated randomly, so no major caching from oracles side should be involved.
Oracle is probably able to cache the query plan in the statement cache; per the Oracle® Database JDBC Developer's Guide Implicit Statement Caching,
When you enable implicit Statement caching, JDBC automatically caches the prepared or callable statement when you call the close method of this statement object. The prepared and callable statements are cached and retrieved using standard connection object and statement object methods.
Plain statements are not implicitly cached, because implicit Statement caching uses a SQL string as a key and plain statements are created without a SQL string. Therefore, implicit Statement caching applies only to the OraclePreparedStatement and OracleCallableStatement objects, which are created with a SQL string.
Related
Is it good practice to use Prepared statement for SELECT SQL with connection pooling. (In my case I use Tomcat JDBC connection pool).
Does it add any advantage(speed ups) or it will add overhead for maintaining the Prepared Statements, connections and keep them alive or track whether closed as Pooled connections are maintained internally and they get closed according to different settings as specified here.
I am using DataSource to get connection, Database is MariaDB.
While reading various posts, documentations and examples most of Prepared Statement have been built using INSERT or UPDATE queries. Does it points that for SELECT it will not add any advantage?
MariaDB/MySQL prepared statements do not have any advantages when it comes to query parsing / optimizing, the query plan is not preserved as on some other SQL databases.
They do have a performance advantage when it comes to transferring result sets as column values can be transfered in binary form and get stored into result variables right away. With classic non-prepared statements all result fields are converted to textual form on the server side. This adds processing time on the server side, leads to more bytes having to be transfered over the wire, and depending on your application needs the client side may need to convert values back from textual to binary form (e.g. for integer and float values).
The other reason for using prepared statements, as also noted in the previous comments, is that it is a reliable way to prevent SQL injection, and that applies to SELECT as well as INSERT/UPDATE/DELETE
It's good to use PreparedStatement if you can:
Prevent SQLInjection
Abstract Date/Time representation
Deal with Charset conversions
Readability (you see one string with full SQL)
As the SQL stays constant (with ?) the database might cache the plan and doesn't have to reparse
In case of SELECT the main focus of cause lies with the parameters passed into the WHERE condition.
As for performance: This may depend - but I've never experienced PreparedStatements to be significantly worse than simple Statements - if coded correct of cause.
The fact that you're pooling connections doesn't add much to this. The concept of somehow "preparing all the statements you're going to need on that connection for later" is not how PreparedStatments are meant to be used. It's perfectly fine to prepare the same tiny Statement over and over and over - altough if faced with a loop of INSERTs or UPDATEs it would be wise to reuse PreparedStatement and/or batch the INSERTs
I have looked into various places, and have heard a lot of dubious claims, ranging from PreparedStatement should be preferred over Statement everywhere, even if only for the performance benefit; all the way to claims that PreparedStatements should be used exclusively for batched statements and nothing else.
However, there seems to be a blind spot in (primarily online) discussions I have followed. Let me present a concrete scenario.
We have an EDA-designed application with a DB connection pool. Events come, some of them require persistence, some do not. Some are artificially generated (e.g. update/reset something every X minutes, for example).
Some events come and are handled sequentially, but other types of events (also requiring persistence) can (and will) be handled concurrently.
Aside from those artificially generated events, there is no structure in how events requiring persistence arrive.
This application was designed quite a while ago (roughly 2005) and supports several DBMSes. The typical event handler (where persistence is required):
get connection from pool
prepare sql statement
execute prepared statement
process the result set, if applicable, close it
close prepared statement
prepare a different statement, if necessary and handle the same way
return connection to pool
If an event requires batch processing, the statement is prepared once and addBatch/executeBatch methods are used. This is an obvious performance benefit and these cases are not related to this question.
Recently, I have received an opinion, that the whole idea of preparing (parsing) a statement, executing it once and closing is essentially a misuse of PreparedStatement, provides zero performance benefits, regardless of whether server or client prepared statements are used and that typical DBMSes (Oracle, DB2, MSSQL, MySQL, Derby, etc.) will not even promote such a statement to prepared statement cache (or at least, their default JDBC driver/datasource will not).
Moreover, I had to test certain scenarios in dev environment on MySQL, and it seems that the Connector/J usage analyzer agrees with this idea. For all non-batched prepared statements, calling close() prints:
PreparedStatement created, but used 1 or fewer times. It is more efficient to prepare statements once, and re-use them many times
Due to application design choices outlined earlier, having a PreparedStatement instance cache that holds every single SQL statement used by any event for each connection in the connection pool sounds like a poor choice.
Could someone elaborate further on this? Is the logic "prepare-execute (once)-close" flawed and essentially discouraged?
P.S. Explicitly specifying useUsageAdvisor=true and cachePrepStmts=true for Connector/J and using either useServerPrepStmts=true or useServerPrepStmts=false still results in warnings about efficiency when calling close() on PreparedStatement instances for every non-batched SQL statement.
Is the logic prepare-execute [once]-close flawed and essentially discouraged?
I don't see that as being a problem, per se. A given SQL statement needs to be "prepared" at some point, whether explicitly (with a PreparedStatement) or "on the fly" (with a Statement). There may be a tiny bit more overhead incurred if we use a PreparedStatement instead of a Statement for something that will only be executed once, but it is unlikely that the overhead involved would be significant, especially if the statement you cite is true:
typical DBMSes (Oracle, DB2, MSSQL, MySQL, Derby, etc.) will not even promote such a statement to prepared statement cache (or at least, their default JDBC driver/datasource will not).
What is discouraged is a pattern like this:
for (int thing : thingList) {
PreparedStatement ps = conn.prepareStatement(" {some constant SQL statement} ");
ps.setInt(1, thing);
ps.executeUpdate();
ps.close();
}
because the PreparedStatement is only used once and the same SQL statement is being prepared over and over again. (Although even that might not be such a big deal if the SQL statement and its executation plan are indeed cached.) The better way to do that is
PreparedStatement ps = conn.prepareStatement(" {some constant SQL statement} ");
for (int thing : thingList) {
ps.setInt(1, thing);
ps.executeUpdate();
}
ps.close();
... or even better, with a "try with resources" ...
try (PreparedStatement ps = conn.prepareStatement(" {some constant SQL statement} ")) {
for (int thing : thingList) {
ps.setInt(1, thing);
ps.executeUpdate();
}
}
Note that this is true even without using batch processing. The SQL statement is still only prepared once and used several times.
As others already stated, the most expensive part is the parsing the statement in the database. Some database systems (this is pretty much DB dependent – I will speak mainly for Oracle) may profit, if the statement is already parsed in the shared pool. (In Oracle terminology this is called a soft parse that is cheaper than a hard parse - a parse of a new statement). You can profit from soft parse even if you use the prepared statement only once.
So the important task is to give the database a chance to reuse the statement. A typical counter example is the handling of the IN list based on a collection in Hibernate. You end with the statement such as
.. FROM T WHERE X in (?,?,?, … length based on the size of the collection,?,? ,?,?)
You can’t reuse this statement if the size of the collection differ.
A good starting point to get overview of the spectrum of the SQL queries produced by a running application is (by Oracle) the V$SQL view. Filter the PARSING_SCHEMA_NAME with you connection pool user and check the SQL_TEXT and the EXECUTIONS count.
Two extreme situation should be avoided:
Passing parameters (IDs) in the query text (this is well known) and
Reusing statement for different access paths.
An example of the latter is a query that with a provided parameter performs an index access to a limited part of the table, while without the parameter all records should be processed (full table scan). In that case is definitively no problem to create two different statements (as the parsing of both leads to different execution plans).
PreparedStatements are preferable because one is needed regardless of whether you create one programmatically or not; internally the database creates one every time a query is run - creating one programatically just gives you a handle to it. Creating and throwing away a PreparedStatement every time doesn't add much overhead over using Statement.
A large effort is required by the database to create one (syntax checking, parsing, permissions checking, optimization, access strategy, etc). Reusing one bypasses this effort for subsequent executions.
Instead of throwing them away, try either writing the query in such a way that it can be reused, eg by ignoring null input parameters:
where someCol = coalesce(?, someCol)
so if you set the parameter to null (ie "unspecified), the condition succeeds)
or if you absolutely must build the query every time, keep references to the PreparedStatements in a Map where the built query is the key and reuse them if you get a hit. Use a WeakHashMap<String, PreparedStatements> for you map implementation to prevent running out of memory.
PreparedStatement created, but used 1 or fewer times. It is more efficient to prepare statements once, and re-use them many times
I thing you may safely ignore this warning, it is similar to a claim It is more efficient to work first 40 hour in the week, than sleep next 56 hours, eat following 7 hours and the rest is your free time.
You need exactly one execution per event - should you perform 50 to get a better average?
SQL commands that run only once, in terms of performance, just waste database resources (memory, processing) being sent in a Prepared Statement. In other hand, not using Prepared Statement let app vulnerable to SQL injection.
Are security (protection from SQL injection) working against performance (prepared statement that runs just once) ? Yes, but...
But it should not be that way. It's a choice java does NOT implement an interface to let developers call the right database API: SQL commands that run just once AND are properly protected against SQL injection ! Why Java just not implement the correct tool for this specific task?
It could be as follows:
Statement Interface - Different SQL commands could be submitted. One execution of SQL commands. Bind variables not allowed.
PreparedStatement Interface - One SQL command could be submitted. Multiple executions of SQL command. Bind variables allowed.
(MISSING IN JAVA!) RunOnceStatement - One SQL command could be submitted. One execution of SQL command. Bind variables allowed.
For example, the correct routine (API) could be called in Postgres, by driver mapping to:
- Statement Interface - call PQExec()
- PreparedStatement Interface - call PQPrepare() / PQExecPrepare() / ...
- (MISSING IN JAVA!) RunOnceStatement Interface - call PQExecParams()
Using prepared statement in SQL code that runs just once is a BIG performance problem: more processing in database, waste database memory, by maintaining plans that will not called later. Cache plans get so crowed that actual SQL commands that are executed multiple times could be deleted from cache.
But Java does not implement the correct interface, and forces everybody to use Prepared Statement everywhere, just to protect against SQL injection...
I am using Java to read from a SQL RDBMS and return the results to the user. The problem is that the database table has 155 Million rows, which make the wait time really long.
I wanted to know if it is possible to retrieve results as they come from the database and present them incrementaly to the user (in batches).
My query is a simple SELECT * FROM Table_Name query.
Is there a mechanism or technology that can give me callbacks of DB records, in batches until the SELECT query finishes?
The RDBMS that is used is MS SQL Server 2008.
Thanks in advance.
Methods Statement#setFetchSize and Statement#getMoreResults are supposed to allow you to manage incremental fetches from the database. Unfortunately, this is the interface spec and vendors may or may not implement these. Memory management during a fetch is really down to the vendor (which is why I wouldn't strictly say that "JDBC just works like this").
From the JDBC documentation on Statement :
setFetchSize(int rows)
Gives the JDBC driver a hint as to the number of rows that should be
fetched from the database when more rows are needed for ResultSet
objects genrated by this Statement.
getMoreResults()
Moves to this Statement object's next result, returns true if it is a
ResultSet object, and implicitly closes any current ResultSet object(s)
obtained with the method getResultSet.
getMoreResults(int current)
Moves to this Statement object's next result, deals with any current
ResultSet object(s) according to the instructions specified by the given
flag, and returns true if the next result is a ResultSet object.
current param indicates Keep or close current ResultSet?
Also, this SO response answers about the use of setFetchSize with regards to SQLServer 2005 and how it doesn't seem to manage batched fetches. The recommendation is to test this using the 2008 driver or moreover, to use the jTDS driver (which gets thumbs up in the comments)
This response to the same SO post may also be useful as it contains a link to SQLServer driver settings on MSDN.
There's also some good info on the MS technet website but relating more to SQLServer 2005. Couldn't find the 2008 specific version in my cursory review. Anyway, it recommends creating the Statement with:
com.microsoft.sqlserver.jdbc.SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY (2004) scrollability for forward-only, read-only access, and then use the setFetchSize method to tune performance
Using pagination (LIMIT pageno, rows / TOP) might create holes and duplicates, but might be used in combination with checking the last row ID (WHERE id > ? ORDER BY id LIMIT 0, 100).
You may use TYPE_FORWARD_ONLY or FETCH_FORWARD_ONLY.
This is exactly how is JDBC driver supposed to work (I remember the bug in old PostgreSQL driver, that caused all fetched records to be stored in memory).
However, it enables you to read record when the query starts to fetch them. This is where I would start to search.
For example, Oracle optimizes SELECT * queries for fetching the whole set. It means it can take a lot of time before first results will appear. You can give hints to optimize for fetching first results, so you can show first rows to your user quite fast, but the whole query can take longer to execute.
You should test your query on console first, to check when it starts to fetch results. Then try with JDBC and monitor the memory usage while you iterate through ResultSet. If the memory usage grows fast, check if you have opened ResultSet in forward-only and read-only mode, if necessary update driver.
If such solution is not feasible because of memory usage, you can still use cursors manually and fetch N rows (say, 100) in each query.
Cursor documentation for MSSQL: for example here: http://msdn.microsoft.com/en-us/library/ms180152.aspx
I have a Java program that connects to a SQL Server 2008 database and performs modifications. If I have a million records I would like to modify, is it bad practice to do as follows:
for(all of the records I need to modify) {
PreparedStatement pst = conn.prepareStatement(someQuery);
// set record specific parameters for pst
// execute pst
}
Or should I build a single query and execute it? Will it make a difference? Does it depend on whether it is an UPDATE, INSERT, or DELETE? My SQL knowledge is quite basic.
If the query is the same for all of your iterations, create the PreparedStatement before the iteration, and in the end of iteration call PreparedStatemetn.executeBatch() as Jesse Webb suggested.
I recommend to commit your transaction after a couple of iterations (may be after each 1000 iterations), because when updating or deleting a record without committing the transaction, there will be locks on mutating records which can cause problem for other users of the database (if you are not the only client of those database objects!).
For large amounts of UPDATEs, it is best to use Statement.executeBatch().
Try Google'ing for "java executebatch example" for examples.
You will most likely want to also make sure you use Transactions properly, a lot of the overhead of queries come from implicit Transaction (one for every query) where using a single Transaction for many statements can be much more efficient.
When to use statement instead of prepared statement. i suppose statement is used in queries with no parameter but why not use prepared statement ? Which one is faster for queries with no params.
I suppose statement is used in queries with no parameter but why not use prepared statement ?
That's not even close. PreparedStatements are used in the case of INSERT, UPDATE and DELETE statements that return a ResultSet or an update count. They will not work for DDL statements as pointed out by Joachim, and neither will they work for invocation of stored procedures where a CallableStatement ought to be used (this is not a difference between the two classes). As far as queries with no bind parameters are concerned, PreparedStatements can turn out to be better than Statements (see below).
Which one is faster for queries with no params.
PreparedStatements will turn out to be faster in the long run, over extended use in a single connection. This is because, although PreparedStatements have to be compiled, which would take some time (this really isn't a lot, so don't see this as a drawback), the compiled version essentially holds a reference to the SQL execution plan in the database. Once compiled, the PreparedStatement is stored in a connection specific cache, so that the compiled version may be reused to achieve performance gains. If you are using JDBC Batch operations, using PreparedStatements will make the execution of the batch much faster than the use of plain Statement objects, where the plan may have to be prepared time and again, if the database has to do so.
That's depending on Your requirement.
If you have a SQL statement which runs in a loop or frequently with different parameters then PreparedStatement is the best candidate since it is getting pre-compiled and cache the execution plan for this parameterized SQL query. Each time it runs from the same PreparedStatement object it will use cached execution plan and gives the better performance.
Also SQL injection can be avoided using PreparedStatement .
But if you are sure that you run SQL query only once, sometimes Statement will be the best candidate since when you create PreparedStatement object sometimes it make additional db call, if the driver supports precompilation, the method Connection.prepareStatement(java.lang.String) will send the statement to the database for precompilation.
Read below article to understand "Statement Versus PreparedStatement"
Java Programming with Oracle JDBC