[Java 7/Oracle] I have a multi-threaded application in which I plan to use a database connection pool. I would like to use prepared statements; however, it seems that prepared statements contain and would therefore seem to be inextricably bound to a single database connection object. The paradigm I want is NOT "open a connection, prepare a statement, do the same query thousands of times, then close the connection" as seems to be the sample code everywhere; the paradigm I want is "precompile this statement so it is run as efficiently as possible - get a random connection from the pool - execute the statement against that connection - release the connection back to the pool". Is this even possible in Java?
If you use a connection pool then connections are not actually closed when you call close on the conneciton object. Instead it is returned to the pool. This is usually achieved by wrapping the original connection in a proxy which intercepts your call to close.
Many connection pools and some drivers offer the possibiity to cache prepared statements to remedy the problem of continuously preparing the same statements. Of course, because the prepared statement is linked to the connection, you probably will preprare the same statement for every connection in the pool.
Related
Will closing the preparedStatement also close and return the connection to the connection pool?
public void insertProjectIntoDatabase(Project project) {
String insertProjectIntoDatabase =
"INSERT INTO projects(Project_Id, Project_Name, Project_StartDate, Deadline) " +
"VALUES (?, ?, ?, ?)";
try {
preparedStatement = DBCPDataSource.getConnection().prepareStatement(insertProjectIntoDatabase);
preparedStatement.setInt(1, project.getProjectId());
preparedStatement.setString(2, project.getName());
preparedStatement.setDate(3, java.sql.Date.valueOf(project.getStartDate()));
preparedStatement.setDate(4, java.sql.Date.valueOf(project.getDeadline()));
preparedStatement.execute();
preparedStatement.close();
}
catch (SQLException e)
{
System.out.println("Error happened in ProjectRepository at insertProjectIntoDatabase(): " + e.getMessage());
}
}
Bonus question:
I have created performance tests for creating a new connection each time an object needs one, Singleton connection and connection pool.
Singleton - Fastest
Creating a new connection each time - Slower (1.2s than the one above)
Connection Pool - Slowest (First connection - 2-3s slower than the one above, following tests are 0.4s slower than the one above)
I am using Apache Commons DBCP for the connection pool.
I thought using connection pools, would be just a little slower than Singleton connection.
Have I done something wrong?
You asked:
Will closing the preparedStatement also close and return the connection to the connection pool?
Start with the documentation:
Releases this Statement object's database and JDBC resources immediately instead of waiting for this to happen when it is automatically closed. It is generally good practice to release resources as soon as you are finished with them to avoid tying up database resources.
Calling the method close on a Statement object that is already closed has no effect.
Note:When a Statement object is closed, its current ResultSet object, if one exists, is also closed.
No mention of closing the connection.
Try intuition: Do we ever run more than one statement in SQL? Yes, obviously. So logically the connection needs to survive across multiple statements to be useful.
Lastly: Try it yourself, an empirical test. Call Connection#isOpen after calling Statement#close.
➥ No, closing the statement does not close the connection.
For the simplest code, learn to use try-with-resources syntax to auto-close your database resources such as result set, statement, and connection. You’ll find many examples of such code on this site, including some written by me.
As for connection pools, yes, calling close on a connection retrieved from a pool causes the connection object to be be returned to the pool. The pool may choose to re-use the connection, or the pool may choose to close the connection. (Not our concern.)
The only point to a connection pool is speed. If opening a connection to the database takes a significant amount of time, we can save that time by re-using existing connection. Generating and re-using connections is the job of a connection pool.
If a connection pool is showing the slowest results in your testing, then here is something seriously wrong with either your pool or your tests. You did not reveal to us your tests, so we cannot help there. Note: As Marmite Bomber commented, be sure your tests do not include the time needed to establish the connection pool.
Frankly, I have found in my experience that opening a database connection does not take a significant amount of time. Furthermore, the details involved in properly implementing a connection pool are complex and treacherous as evidenced by the list of failed and abandoned connection pool implementation projects. That, combined with the inherent risks such as a transaction being left open on a retrieved connection, led me to avoiding the use of connection pools. I would posit that using a connection pool before collecting proof of an actual problem is a case of premature optimization.
I suggest using an implementation of the interface DataSource as a way to mask from the rest of your code whether you are using a pool and to hide which pool implementation you are currently using. Using DataSource gives you the flexibility to to change between using or not using a connection pool, and the flexibility to change between pools. Those changes become deployment choices, with no need to change your app programming.
Pools are meant to improve performance, not degrade it. DBCP is naive, complicated, and outdated.
I don't think it's appropriate for a production application, especially when so many drivers support pooling in their DataSource natively. The entire pool gets locked the whole time a new connection attempt is made to the database. So, if something happens to your database that results in slow connections or timeouts, other threads are blocked when they try to return a connection to the pool—even though they are done using a database.
Even C3PO performs terribly.
Please try using one of the two connection pools tomcat_connection_pool or HikariCP
Now coming to your main part of the question if you have closed the connection correctly?
Whenever you use a connection pool and you fetch an available connection from the pool you need not have to close the connection that you fetched in your Dao layer. The pool manages the connections that you have created and each connection that the pool lends has a timeout associated with it before which it has to return to the pool. When the pool is shut down all the connections shutdown too.
For more information on how to configure these properties in your connection pool. Please check the links above for each of the connection pools.
I am trying to execute multiple queries in a single function using single connection object. I would like to know what is the best practice to close the database connection in a scenario like this. Currently , I close the connection once all the db calls are completed. I am wondering whether I need to close the connection and open a new connection for every db call. Which is better?
You should keep the Connection open as long as possible. Creating a database connection is a (relatively) expensive operation, so you don't want to do it more often that you need to.
To manage the lifetime, you should use the try-with-resources statement assuming you are on at least Java 7:
try (Connection connection = myDataSource.getConnection()) {
// Do your queries here
}
I have the following scenario:
MethodA() calls MethodB()
MethodB() calls MethodC()
All methods have to execute some query to DB. So, to do this I create a connection object and pass it along the chain of methods to reuse the connection object.
Assumption here is that connection pooling is not being employed.
Now my question is, should only a single connection be opened and reused and be closed at the starting point (in the above example, the connection will be opened and closed in MethodA) ? or should I create a separate connection for each method?
Reusing the connection seems better, but then I will have to keep the connection open till the control comes back to MethodA().
I have read that reusing the connection is better as they are expensive to create. But then I have also read that its better to close the connection as soon as possible, i.e., once you are done with the query call.
Which approach is better and why?
It sounds like you are only querying the DB and not updating or inserting. If that is the case then you avoid many of the transactional semantics in such a nested procedure call.
If that is true, then simply connect once, do all your querying and close the connection. While usage of a connection pool is somewhat orthogonal to your question - use one if you can. They greatly simplify your code because you can have the pool automatically test the connection before it gives one to you. It will auto-create a new connection if the connection was lost (let's say because the DB was bounced).
Finally, you want to minimize the number of times you create a DB Connection BECAUSE it is expensive. However, this is often non-trivial. Databases themselves only support a maximum number of connections. If there are many clients, then you would need to take this into consideration. If you have the trivial case - one database and your program is the only one making connections, then open the connection and leave it open for the duration of the program. This would require you to validate it, so using a DB Pool, size of 1, avoids that.
As per my understanding, JDBC Connection Pooling (at a basic level) works this way:
create connections during app initialization and put in a cache
provide these cached connections on demand to the app
a separate thread maintains the Connection Pool, performing activities like:
discard connections that have been used (closed)
create new connections and add to the cache to maintain a specific count of connections
But, whenever I hear the term "connection reuse" in a JDBC Connection Pooling discussion, I get confused. When does the connection reuse occurs?
Does it means that Connection Pool provides the same connection for two different database interactions (without closing it)? Or, is there a way to continue using a connection even after it gets closed after a DB call?
Connection pooling works by re-using connections. Applications "borrow" a connection from the pool, then "return" it when finished. The connection is then handed out again to another part of the application, or even a different application.
This is perfectly safe as long as the same connection is not is use by two threads at the same time.
The key point with connection pooling is to avoid creating new connections where possible, since it's usually an expensive operation. Reusing connections is critical for performance.
The connection pool does not provide you with the actual Connection instance from the driver, but returns a wrapper. When you call 'close()' on a Connection instance from the pool, it will not close the driver's Connection, but instead just return the open connection to the pool so that it can be re-used (see skaffman's answer).
Connection pooling reuses connections.
Here is how apache dbcp works underline.
Connection poolableConnection= apacheDbcpDataSource.getConnection();
Apache DBCP implementation returns connection wrapper which is of type PoolableConnection.
poolableConnection.close();
PoolableConnection.close() inspects if actual underlying connection is closed or not, if not then it returns this PoolableConnection instance into connection pool (GenericObjectPool in this case).
if (!isUnderlyingConectionClosed) {
// Normal close: underlying connection is still open, so we
// simply need to return this proxy to the pool
try {
genericObjectPool.returnObject(this); //this is PoolableConnection instance in this case
....
}
My understanding is the same as stated above and, thanks to a bug, I have evidence that it's correct. In the application I work with there was a bug, an SQL command with an invalid column name. On execution an exception is thrown. If the connection is closed then the next time a connection is gotten and used, with correct SQL this time, an exception is thrown again and the error message is the same as the first time though the incorrect column name doesn't even appear in the second SQL. So the connection is obviously being reused. If the connection is not closed after the first exception is thrown (because of the bad column name) then the next time a connection is used everything works just fine. Presumably this is because the first connection hasn't been returned to the pool for reuse. (This bug is occurring with Jave 1.6_30 and a connection to a MySQL database.)
What is a Connection Object in JDBC ? How is this Connection maintained(I mean is it a Network connection) ? Are they TCP/IP Connections ? Why is it a costly operation to create a Connection every time ? Why do these connections become stale after sometime and I need to refresh the Pool ? Why can't I use one connection to execute multiple queries ?
These connections are TCP/IP connections. To not have to overhead of creating every time a new connection there are connection pools that expand and shrink dynamically. You can use one connection for multiple queries. I think you mean that you release it to the pool. If you do that you might get back the same connection from the pool. In this case it just doesn't matter if you do one or multiple queries
The cost of a connection is to connect which takes some time. ANd the database prepares some stuff like sessions, etc for every connection. That would have to be done every time. Connections become stale through multiple reasons. The most prominent is a firewall in between. Connection problems could lead to connection resetting or there could be simple timeouts
To add to the other answers:
Yes, you can reuse the same connection for multiple queries. This is even advisable, as creating a new connection is quite expensive.
You can even execute multiple queries concurrently. You just have to use a new java.sql.Statement/PreparedStatement instance for every query. Statements are what JDBC uses to keep track of ongoing queries, so each parallel query needs its own Statement. You can and should reuse Statements for consecutive queries, though.
The answers to your questions is that they are implementation defined. A JDBC connection is an interface that exposes methods. What happens behind the scenes can be anything that delivers the interface. For example, consider the Oracle internal JDBC driver, used for supporting java stored procedures. Simultaneous queries are not only possible on that, they are more or less inevitable, since each request for a new connection returns the one and only connection object. I don't know for sure whether it uses TCP/IP internally but I doubt it.
So you should not assume implementation details, without being clear about precisely which JDBC implementation you are using.
since I cannot comment yet, wil post answer just to comment on Vinegar's answer, situation with setAutoCommit() returning to default state upon returning connection to pool is not mandatory behaviour and should not be taken for granted, also as closing of statements and resultsets; you can read that it should be closed, but if you do not close them, they will be automatically closed with closing of connection. Don't take it for granted, since it will take up on your resources on some versions of jdbc drivers.
We had serious problem on DB2 database on AS400, guys needing transactional isolation were calling connection.setAutoCommit(false) and after finishing job they returned such connection to pool (JNDI) without connection.setAutoCommit(old_state), so when another thread got this connection from pool, inserts and updates have not commited, and nobody could figure out why for a long time...