JDBC Connection Pooling: Connection Reuse? - java

As per my understanding, JDBC Connection Pooling (at a basic level) works this way:
create connections during app initialization and put in a cache
provide these cached connections on demand to the app
a separate thread maintains the Connection Pool, performing activities like:
discard connections that have been used (closed)
create new connections and add to the cache to maintain a specific count of connections
But, whenever I hear the term "connection reuse" in a JDBC Connection Pooling discussion, I get confused. When does the connection reuse occurs?
Does it means that Connection Pool provides the same connection for two different database interactions (without closing it)? Or, is there a way to continue using a connection even after it gets closed after a DB call?

Connection pooling works by re-using connections. Applications "borrow" a connection from the pool, then "return" it when finished. The connection is then handed out again to another part of the application, or even a different application.
This is perfectly safe as long as the same connection is not is use by two threads at the same time.
The key point with connection pooling is to avoid creating new connections where possible, since it's usually an expensive operation. Reusing connections is critical for performance.

The connection pool does not provide you with the actual Connection instance from the driver, but returns a wrapper. When you call 'close()' on a Connection instance from the pool, it will not close the driver's Connection, but instead just return the open connection to the pool so that it can be re-used (see skaffman's answer).

Connection pooling reuses connections.
Here is how apache dbcp works underline.
Connection poolableConnection= apacheDbcpDataSource.getConnection();
Apache DBCP implementation returns connection wrapper which is of type PoolableConnection.
poolableConnection.close();
PoolableConnection.close() inspects if actual underlying connection is closed or not, if not then it returns this PoolableConnection instance into connection pool (GenericObjectPool in this case).
if (!isUnderlyingConectionClosed) {
// Normal close: underlying connection is still open, so we
// simply need to return this proxy to the pool
try {
genericObjectPool.returnObject(this); //this is PoolableConnection instance in this case
....
}

My understanding is the same as stated above and, thanks to a bug, I have evidence that it's correct. In the application I work with there was a bug, an SQL command with an invalid column name. On execution an exception is thrown. If the connection is closed then the next time a connection is gotten and used, with correct SQL this time, an exception is thrown again and the error message is the same as the first time though the incorrect column name doesn't even appear in the second SQL. So the connection is obviously being reused. If the connection is not closed after the first exception is thrown (because of the bad column name) then the next time a connection is used everything works just fine. Presumably this is because the first connection hasn't been returned to the pool for reuse. (This bug is occurring with Jave 1.6_30 and a connection to a MySQL database.)

Related

Am I closing the DB connection correctly? JDBC - DBCP

Will closing the preparedStatement also close and return the connection to the connection pool?
public void insertProjectIntoDatabase(Project project) {
String insertProjectIntoDatabase =
"INSERT INTO projects(Project_Id, Project_Name, Project_StartDate, Deadline) " +
"VALUES (?, ?, ?, ?)";
try {
preparedStatement = DBCPDataSource.getConnection().prepareStatement(insertProjectIntoDatabase);
preparedStatement.setInt(1, project.getProjectId());
preparedStatement.setString(2, project.getName());
preparedStatement.setDate(3, java.sql.Date.valueOf(project.getStartDate()));
preparedStatement.setDate(4, java.sql.Date.valueOf(project.getDeadline()));
preparedStatement.execute();
preparedStatement.close();
}
catch (SQLException e)
{
System.out.println("Error happened in ProjectRepository at insertProjectIntoDatabase(): " + e.getMessage());
}
}
Bonus question:
I have created performance tests for creating a new connection each time an object needs one, Singleton connection and connection pool.
Singleton - Fastest
Creating a new connection each time - Slower (1.2s than the one above)
Connection Pool - Slowest (First connection - 2-3s slower than the one above, following tests are 0.4s slower than the one above)
I am using Apache Commons DBCP for the connection pool.
I thought using connection pools, would be just a little slower than Singleton connection.
Have I done something wrong?
You asked:
Will closing the preparedStatement also close and return the connection to the connection pool?
Start with the documentation:
Releases this Statement object's database and JDBC resources immediately instead of waiting for this to happen when it is automatically closed. It is generally good practice to release resources as soon as you are finished with them to avoid tying up database resources.
Calling the method close on a Statement object that is already closed has no effect.
Note:When a Statement object is closed, its current ResultSet object, if one exists, is also closed.
No mention of closing the connection.
Try intuition: Do we ever run more than one statement in SQL? Yes, obviously. So logically the connection needs to survive across multiple statements to be useful.
Lastly: Try it yourself, an empirical test. Call Connection#isOpen after calling Statement#close.
➥ No, closing the statement does not close the connection.
For the simplest code, learn to use try-with-resources syntax to auto-close your database resources such as result set, statement, and connection. You’ll find many examples of such code on this site, including some written by me.
As for connection pools, yes, calling close on a connection retrieved from a pool causes the connection object to be be returned to the pool. The pool may choose to re-use the connection, or the pool may choose to close the connection. (Not our concern.)
The only point to a connection pool is speed. If opening a connection to the database takes a significant amount of time, we can save that time by re-using existing connection. Generating and re-using connections is the job of a connection pool.
If a connection pool is showing the slowest results in your testing, then here is something seriously wrong with either your pool or your tests. You did not reveal to us your tests, so we cannot help there. Note: As Marmite Bomber commented, be sure your tests do not include the time needed to establish the connection pool.
Frankly, I have found in my experience that opening a database connection does not take a significant amount of time. Furthermore, the details involved in properly implementing a connection pool are complex and treacherous as evidenced by the list of failed and abandoned connection pool implementation projects. That, combined with the inherent risks such as a transaction being left open on a retrieved connection, led me to avoiding the use of connection pools. I would posit that using a connection pool before collecting proof of an actual problem is a case of premature optimization.
I suggest using an implementation of the interface DataSource as a way to mask from the rest of your code whether you are using a pool and to hide which pool implementation you are currently using. Using DataSource gives you the flexibility to to change between using or not using a connection pool, and the flexibility to change between pools. Those changes become deployment choices, with no need to change your app programming.
Pools are meant to improve performance, not degrade it. DBCP is naive, complicated, and outdated.
I don't think it's appropriate for a production application, especially when so many drivers support pooling in their DataSource natively. The entire pool gets locked the whole time a new connection attempt is made to the database. So, if something happens to your database that results in slow connections or timeouts, other threads are blocked when they try to return a connection to the pool—even though they are done using a database.
Even C3PO performs terribly.
Please try using one of the two connection pools tomcat_connection_pool or HikariCP
Now coming to your main part of the question if you have closed the connection correctly?
Whenever you use a connection pool and you fetch an available connection from the pool you need not have to close the connection that you fetched in your Dao layer. The pool manages the connections that you have created and each connection that the pool lends has a timeout associated with it before which it has to return to the pool. When the pool is shut down all the connections shutdown too.
For more information on how to configure these properties in your connection pool. Please check the links above for each of the connection pools.

How to verify that connection pooling is working

I have set up connection pooling in my Tomcat configuration, but now I want to verify that it is actually working.
Is there a way to dump out some sort of ID of the active connection so that I can verify the same one is being used between requests? I have checked Oracle's Connection Documentation but to no avail.
Thanks in advance!
A simple way to check pool members are re-used: If your JDBC vendor is using the standard toString from Object you should see the same values printed when you print the connection:
System.out.println("Connection="+conn);
If this changes each pool get call, then the connection is not the same as before. However this may not help you at all if your DataSource is wrapping a pooled connection each time with it's own handler class - typically done to make close() return to DataSource and keeps the underlying Connection open.
If your JDBC vendor has not used standard toString() you can make your own string to use in debug / test statements:
public String toString(Connection conn) {
return conn.getClass().getName() + "#" + Integer.toHexString(conn.hashCode());
}
System.out.println("Connection="+toString(conn));
Note that the above approach does not guard against rogue code changing elements of the Connection or leaving it in in-determinate state. For example I've seen: altered auto-commit modes, selecting another default database database schema (Sybase), not committing the previous transaction!
For some DBs you can mitigate with a test query before use but this incurs an overhead.
Simple check would be
SELECT SID, SERIAL# FROM V$SESSION WHERE SID = SYS_CONTEXT('USERENV', 'SID')
if your pool size is 1 you will get the same values from any connection object. If your pool size is greater (it also depends if you have fixed pool size or if it is set to grow when needed) and you have many active connections at the same time you should get up the pool size number of distinct twins.
If the connection is non-pooled creating and opening a new connection object will every time return different values.
There's a simple answer and one that makes more work: If you configure a connection pool and don't explicitly open a connection to your database anywhere in your code, the mere nonexistence of manual connection creation should be a clue that something in your connection pool works.
As the connection pool comes from Tomcat, it will also be contained in the data that you can tap into through JMX - enable JMX and connect with your jconsole. This will give you information about the exact load (used connections, free connections, pre-allocated connections) of your connection pool any time you look.

Should I close a Connection obtained from a DataSource manually?

When I get a Connection from the DataSource, should I close it manually? I mean in case I must close it, how it will be used in future requests?
A connection obtained from a connection pool should be used exactly the same as a normal connection. The JDBC 4.2 specification (section 11.1) says about pooling:
When an application is finished using a connection, it closes the logical connection
using the method Connection.close. This closes the logical connection but does
not close the physical connection. Instead, the physical connection is returned to the
pool so that it can be reused.
Connection pooling is completely transparent to the client: A client obtains a pooled
connection and uses it just the same way it obtains and uses a non pooled
connection.
(emphasis mine)
This means that when you are done with a connection, you always call Connection.close()! It doesn't matter if it is a physical connection, or a logical connection from the pool.
The reason is that whether a connection is a physical (direct) connection or a logical connection should be purely a matter of configuration, not a concern of the application code that merely uses the connection.
In the case of a connection pool, the close() will - details may vary, and some implementations are buggy in this respect - invalidate the logical connection and signal to the connection pool that the underlying physical connection is available for re-use. The connection pool may do some validity checks and then return the (physical) connection into the pool or close it (eg if the pool has too many idle connections, or the connection is too old, etc).
Calling close() is not only allowed, it is even vital for the correct working of a connection pool. Not calling close() usually requires some helper thread to close (reclaim) logical connections that have been in use for too long. As this timeout is usually longer than normal application needs, it might lead to exhaustion of the pool, or to configuration where the pool needs a higher maximum number of connections than is really necessary.
You should close Connection in order to return it to the pool, next time you'll ask for Datasource.getConnection() connection from the pool will be obtained. There is no problem here.
Sometimes you don't want to close connection after each operation and use the same connection for several operations. In this case you shouldn't close it until last operation finished.
Use try with resources to avoid connection problems
try (Connection con = ds.getConnection();
Statement stmt = con.createStatement();
ResultSet rs = stmt.executeQuery(...)) {...}

Java Connection Pooling

I searched for connection pooling and read about it. If I understand it correctly, a connection pool is like a collection of open connections. If a connection is established or created it should be added to the connection pool, and if that connection is closed it should be removed in connection pool; while it is open I can use it again and again.
While reading these tutorials and explanations about connection pooling I have some questions:
Can a pool of connections only be used on a certain computer? Like ComputerA
cannot share its connection pool with ComputerB?
Where should connection.close() be placed?
Is it correct to use a connection ONLY when selecting/loading record? After I got the returned records/data I close the connection at finally statement. Same as adding, editing and deleting records. And while it is processing I place a progress bar so the user will have to wait for it to be completed and to do some process again, which means I will only open connection one at a time.
Thanks for the explanation. :)
Note: I assume we're talking about the java.sql.Connection interface.
Can a pool of connections only be used on a certain computer? Like ComputerA cannot share its connection pool with ComputerB?
A connection exists between a running application and a database. Naturally, two different machines can't share the same running application, so they can't share connections with a database.
Where should connection.close() be placed?
You should always make sure to call close() on a Connection instance after using it (typically in a finally block). If pooling is being used, this will actually return the connection to the pool behind the scenes. Reference: Closing JDBC Connections in Pool
Is it correct to use a connection ONLY when selecting/loading record? After I got the returned records/data I close the connection at finally statement.
Yes, that's correct. You don't want to manually hang on to a Connection reference - use it to execute SQL/DML and then check it back into the pool by calling close() in the finally block, just like you're doing.

What is a Connection in JDBC?

What is a Connection Object in JDBC ? How is this Connection maintained(I mean is it a Network connection) ? Are they TCP/IP Connections ? Why is it a costly operation to create a Connection every time ? Why do these connections become stale after sometime and I need to refresh the Pool ? Why can't I use one connection to execute multiple queries ?
These connections are TCP/IP connections. To not have to overhead of creating every time a new connection there are connection pools that expand and shrink dynamically. You can use one connection for multiple queries. I think you mean that you release it to the pool. If you do that you might get back the same connection from the pool. In this case it just doesn't matter if you do one or multiple queries
The cost of a connection is to connect which takes some time. ANd the database prepares some stuff like sessions, etc for every connection. That would have to be done every time. Connections become stale through multiple reasons. The most prominent is a firewall in between. Connection problems could lead to connection resetting or there could be simple timeouts
To add to the other answers:
Yes, you can reuse the same connection for multiple queries. This is even advisable, as creating a new connection is quite expensive.
You can even execute multiple queries concurrently. You just have to use a new java.sql.Statement/PreparedStatement instance for every query. Statements are what JDBC uses to keep track of ongoing queries, so each parallel query needs its own Statement. You can and should reuse Statements for consecutive queries, though.
The answers to your questions is that they are implementation defined. A JDBC connection is an interface that exposes methods. What happens behind the scenes can be anything that delivers the interface. For example, consider the Oracle internal JDBC driver, used for supporting java stored procedures. Simultaneous queries are not only possible on that, they are more or less inevitable, since each request for a new connection returns the one and only connection object. I don't know for sure whether it uses TCP/IP internally but I doubt it.
So you should not assume implementation details, without being clear about precisely which JDBC implementation you are using.
since I cannot comment yet, wil post answer just to comment on Vinegar's answer, situation with setAutoCommit() returning to default state upon returning connection to pool is not mandatory behaviour and should not be taken for granted, also as closing of statements and resultsets; you can read that it should be closed, but if you do not close them, they will be automatically closed with closing of connection. Don't take it for granted, since it will take up on your resources on some versions of jdbc drivers.
We had serious problem on DB2 database on AS400, guys needing transactional isolation were calling connection.setAutoCommit(false) and after finishing job they returned such connection to pool (JNDI) without connection.setAutoCommit(old_state), so when another thread got this connection from pool, inserts and updates have not commited, and nobody could figure out why for a long time...

Categories