I have a question about using prepared statements and a connection pool such as c3p0. I have a script running that interacts with my mysql database around once a second, ad infinitum. Each interaction executes a prepared statement. According to the documentation, a connection should be open and closed before and after a database interaction. My understanding is that the Connection object doesn't actually get destroyed, but added back to the pool. Since prepared statements are connection dependent, how do I use Prepared Statements without having to rebuild them every time that I get a connection from the pool - or do I just rebuild the statement after a connection is received by the pool and rely on the pool to do this efficiently via caching?
If your pool implements JDBC transparent Statement caching (as c3p0 does), you just use the ordinary JDBC PreparedStatement API and reuse of cached statements is handled for you.
Internally what happens is that when you call conn.prepareStatement(...) on some Connection, a lookup is performed on an internal hashtable using a key that includes the Connection's identity, the SQL text, and other characteristics of the requested prepared statement. If a suitable PreparedStatement is found, that is what is passed to the client. In none is, then the prepareStatement call gets passed to the Connection, and the returned PreparedStatement is cached for later reuse.
Statement caching itself has some overhead, and can be tricky to configure. Some newer Connection pools, most notably HikariCP, simply omit support, arguing that caching of PreparedStatements is better left to the DBMS. That, of course, is an empirical question, and will vary from DBMS to DBMS. If you do use Statement caching, the crucial point is that you need to allow for
[num_frequently_used_prepared_statments] * [num_connections]
distinct Statements to be cached. This is tricky to reason about, given the JDBC standard global maxStatements config property defines a global limit, even though PreparedStatements are scoped per-connection.
Much better, if you use c3p0, to set only the (nonstandard) maxStatementsPerConnection property. That should be set to at least the number of PreparedStatements frequently used by your application. You don't have to worry about how many Connections will be open, since maxStatementsPerConnection is scoped per Connection like the Statements themselves are.
I hope this helps!
Related
Problem:
Program uses com.mchange.v2.c3p0.ComboPooledDataSource to connect to Sybase server
Program executes 2 methods, runSQL1() and runSQL2(), in sequence
runSQL1() executes SQL which creates a #temptable
SELECT * INTO #myTemp FROM TABLE1 WHERE X=2
runSQL2() executes SQL which reads from this #temptable
SELECT * FROM #myTemp WHERE Y=3
PROBLEM: runSQL2() gets handed a different DB connection from the pool than the one handed to runSQL1().
However, Sybase #temptables are connection-specific, therefore runSQL2() fails when it can't find the table.
The most obvious solution I can think of (aside from degenerate one of making pool size 1, at which point we don't even need a pool), is to somehow remember which specific connection from the pool was used by runSQL1(), and have runSQL2() request the same connection.
Is there a way to do this in com.mchange.v2.c3p0.ComboPooledDataSource?
If possible, I'd like an answer which is concurrency-safe (in other words, if connection used in runSQL1() is being used by another thread, runSQL2()'s call to get connection will wait until that connection is released by another thread).
However, if that's impossible, I'm OK with the answer which assumes that DB connections (the ones I care about) are all happening in one single thread, and therefore any connection requested by runSQL2() will be 100% available if it was available to runSQL1().
I'm also welcoming of any solutions that address the problem some other way, as long as they don't involve "stop using #temptables" as part of the solution.
Easiest and most obvious way to do that is just to request connection from the pool and then run runSQL1() and runSQL2() with that connection. Usage pattern being suggested in the question goes against general design principles of connection pool managers, as it will effectively promote them to some kind of transaction manager.
There are Java frameworks that might aid in the above. For example in Spring #Transaction or TransactionTemplate can be used to demarcate transaction boundaries and it will guarantee that single connection is used by single thread (or more precisely, according to transaction propagation annotations). Spring can use many transaction managers, but probably simplest would be to use DataSourceTransactionManager and it can also be configured to use c3p0 as DataSource.
[Java 7/Oracle] I have a multi-threaded application in which I plan to use a database connection pool. I would like to use prepared statements; however, it seems that prepared statements contain and would therefore seem to be inextricably bound to a single database connection object. The paradigm I want is NOT "open a connection, prepare a statement, do the same query thousands of times, then close the connection" as seems to be the sample code everywhere; the paradigm I want is "precompile this statement so it is run as efficiently as possible - get a random connection from the pool - execute the statement against that connection - release the connection back to the pool". Is this even possible in Java?
If you use a connection pool then connections are not actually closed when you call close on the conneciton object. Instead it is returned to the pool. This is usually achieved by wrapping the original connection in a proxy which intercepts your call to close.
Many connection pools and some drivers offer the possibiity to cache prepared statements to remedy the problem of continuously preparing the same statements. Of course, because the prepared statement is linked to the connection, you probably will preprare the same statement for every connection in the pool.
Imagine a simple connection pool for java.sql.Connection. After the connection has been released back to the pool we have no idea if any transactions are open, any temporary tables created, etc.
Rather than manually checking if getAutoCommit() is false, and then seeing if we need to roll back, calling rollback(), etc I was hoping there would be a reset() function that does something similar to SQL Severs sp_resetconnection stored procedure but is not DBMS dependent. However looking at Connection's API it seems like there is not.
Does such a function exist?
There is not. In fact, even the SQL Server connection pool datasource class does not invoke the sp_resetconnection call, as it depends upon the application server (or other application managing the connection pool) to return the connection to a known state. See http://msdn.microsoft.com/en-us/library/ms378484(v=sql.90).aspx.
Various drivers, servers, etc. may have their own features, but I'm relatively certain that there is no non-proprietary or cross-database analogs to a Java method that would do what the sp_resetconnection proc does.
I don't believe such a function exists. I looked to solve this about a year ago, and ended up having to call the DBMS specific functions.
I'm trying to figure out if it's efficient for me to cache all of my statements when I create my database connection or if I should only create those that are most used and create the others if/when they're needed..
It seems foolish to create all of the statements in all of the client threads. Any feedback would be greatly appreciated.
A bit decent database will already cache them. Just fire Connection#prepareStatement() at the moment you actually need to execute the query. You actually also have no other choice since connection, statement and resultset ought to be acquired and closed in the shortest possible scope, i.e. in a try-finally block in the very same method as you execute the query.
Opening and closing the connection on every query in turn may indeed be expensive. A common solution to that is using a connection pool, for example c3p0.
I think you're worrying too much, prepared statements already benefit from several level of caching:
At the database level: a decent database will reuse the access plan for a given prepared statement.
At the connection pool level: a decent connection pool will cache PreparedStatement objects for each database connection in the pool (and return a cached PreparedStatement on subsequent calls to preparedStatement on a connection).
So actually, I would even say that you might be looking in the wrong direction. The best practice if you want to design a scalable solution is to use a connection pool and to not held a connection longer than needed and to release it (to release database resources) when you're done with it.
This sounds to me like the kind of premature optimization that I wouldn't worry about until I have some information telling me that it mattered. If your database access is inefficient, I'd suspect your schema or access of values before I'd think of caching prepared statements.
What is a Connection Object in JDBC ? How is this Connection maintained(I mean is it a Network connection) ? Are they TCP/IP Connections ? Why is it a costly operation to create a Connection every time ? Why do these connections become stale after sometime and I need to refresh the Pool ? Why can't I use one connection to execute multiple queries ?
These connections are TCP/IP connections. To not have to overhead of creating every time a new connection there are connection pools that expand and shrink dynamically. You can use one connection for multiple queries. I think you mean that you release it to the pool. If you do that you might get back the same connection from the pool. In this case it just doesn't matter if you do one or multiple queries
The cost of a connection is to connect which takes some time. ANd the database prepares some stuff like sessions, etc for every connection. That would have to be done every time. Connections become stale through multiple reasons. The most prominent is a firewall in between. Connection problems could lead to connection resetting or there could be simple timeouts
To add to the other answers:
Yes, you can reuse the same connection for multiple queries. This is even advisable, as creating a new connection is quite expensive.
You can even execute multiple queries concurrently. You just have to use a new java.sql.Statement/PreparedStatement instance for every query. Statements are what JDBC uses to keep track of ongoing queries, so each parallel query needs its own Statement. You can and should reuse Statements for consecutive queries, though.
The answers to your questions is that they are implementation defined. A JDBC connection is an interface that exposes methods. What happens behind the scenes can be anything that delivers the interface. For example, consider the Oracle internal JDBC driver, used for supporting java stored procedures. Simultaneous queries are not only possible on that, they are more or less inevitable, since each request for a new connection returns the one and only connection object. I don't know for sure whether it uses TCP/IP internally but I doubt it.
So you should not assume implementation details, without being clear about precisely which JDBC implementation you are using.
since I cannot comment yet, wil post answer just to comment on Vinegar's answer, situation with setAutoCommit() returning to default state upon returning connection to pool is not mandatory behaviour and should not be taken for granted, also as closing of statements and resultsets; you can read that it should be closed, but if you do not close them, they will be automatically closed with closing of connection. Don't take it for granted, since it will take up on your resources on some versions of jdbc drivers.
We had serious problem on DB2 database on AS400, guys needing transactional isolation were calling connection.setAutoCommit(false) and after finishing job they returned such connection to pool (JNDI) without connection.setAutoCommit(old_state), so when another thread got this connection from pool, inserts and updates have not commited, and nobody could figure out why for a long time...