What is a Connection Object in JDBC ? How is this Connection maintained(I mean is it a Network connection) ? Are they TCP/IP Connections ? Why is it a costly operation to create a Connection every time ? Why do these connections become stale after sometime and I need to refresh the Pool ? Why can't I use one connection to execute multiple queries ?
These connections are TCP/IP connections. To not have to overhead of creating every time a new connection there are connection pools that expand and shrink dynamically. You can use one connection for multiple queries. I think you mean that you release it to the pool. If you do that you might get back the same connection from the pool. In this case it just doesn't matter if you do one or multiple queries
The cost of a connection is to connect which takes some time. ANd the database prepares some stuff like sessions, etc for every connection. That would have to be done every time. Connections become stale through multiple reasons. The most prominent is a firewall in between. Connection problems could lead to connection resetting or there could be simple timeouts
To add to the other answers:
Yes, you can reuse the same connection for multiple queries. This is even advisable, as creating a new connection is quite expensive.
You can even execute multiple queries concurrently. You just have to use a new java.sql.Statement/PreparedStatement instance for every query. Statements are what JDBC uses to keep track of ongoing queries, so each parallel query needs its own Statement. You can and should reuse Statements for consecutive queries, though.
The answers to your questions is that they are implementation defined. A JDBC connection is an interface that exposes methods. What happens behind the scenes can be anything that delivers the interface. For example, consider the Oracle internal JDBC driver, used for supporting java stored procedures. Simultaneous queries are not only possible on that, they are more or less inevitable, since each request for a new connection returns the one and only connection object. I don't know for sure whether it uses TCP/IP internally but I doubt it.
So you should not assume implementation details, without being clear about precisely which JDBC implementation you are using.
since I cannot comment yet, wil post answer just to comment on Vinegar's answer, situation with setAutoCommit() returning to default state upon returning connection to pool is not mandatory behaviour and should not be taken for granted, also as closing of statements and resultsets; you can read that it should be closed, but if you do not close them, they will be automatically closed with closing of connection. Don't take it for granted, since it will take up on your resources on some versions of jdbc drivers.
We had serious problem on DB2 database on AS400, guys needing transactional isolation were calling connection.setAutoCommit(false) and after finishing job they returned such connection to pool (JNDI) without connection.setAutoCommit(old_state), so when another thread got this connection from pool, inserts and updates have not commited, and nobody could figure out why for a long time...
Related
Problem:
Program uses com.mchange.v2.c3p0.ComboPooledDataSource to connect to Sybase server
Program executes 2 methods, runSQL1() and runSQL2(), in sequence
runSQL1() executes SQL which creates a #temptable
SELECT * INTO #myTemp FROM TABLE1 WHERE X=2
runSQL2() executes SQL which reads from this #temptable
SELECT * FROM #myTemp WHERE Y=3
PROBLEM: runSQL2() gets handed a different DB connection from the pool than the one handed to runSQL1().
However, Sybase #temptables are connection-specific, therefore runSQL2() fails when it can't find the table.
The most obvious solution I can think of (aside from degenerate one of making pool size 1, at which point we don't even need a pool), is to somehow remember which specific connection from the pool was used by runSQL1(), and have runSQL2() request the same connection.
Is there a way to do this in com.mchange.v2.c3p0.ComboPooledDataSource?
If possible, I'd like an answer which is concurrency-safe (in other words, if connection used in runSQL1() is being used by another thread, runSQL2()'s call to get connection will wait until that connection is released by another thread).
However, if that's impossible, I'm OK with the answer which assumes that DB connections (the ones I care about) are all happening in one single thread, and therefore any connection requested by runSQL2() will be 100% available if it was available to runSQL1().
I'm also welcoming of any solutions that address the problem some other way, as long as they don't involve "stop using #temptables" as part of the solution.
Easiest and most obvious way to do that is just to request connection from the pool and then run runSQL1() and runSQL2() with that connection. Usage pattern being suggested in the question goes against general design principles of connection pool managers, as it will effectively promote them to some kind of transaction manager.
There are Java frameworks that might aid in the above. For example in Spring #Transaction or TransactionTemplate can be used to demarcate transaction boundaries and it will guarantee that single connection is used by single thread (or more precisely, according to transaction propagation annotations). Spring can use many transaction managers, but probably simplest would be to use DataSourceTransactionManager and it can also be configured to use c3p0 as DataSource.
I have the following scenario:
MethodA() calls MethodB()
MethodB() calls MethodC()
All methods have to execute some query to DB. So, to do this I create a connection object and pass it along the chain of methods to reuse the connection object.
Assumption here is that connection pooling is not being employed.
Now my question is, should only a single connection be opened and reused and be closed at the starting point (in the above example, the connection will be opened and closed in MethodA) ? or should I create a separate connection for each method?
Reusing the connection seems better, but then I will have to keep the connection open till the control comes back to MethodA().
I have read that reusing the connection is better as they are expensive to create. But then I have also read that its better to close the connection as soon as possible, i.e., once you are done with the query call.
Which approach is better and why?
It sounds like you are only querying the DB and not updating or inserting. If that is the case then you avoid many of the transactional semantics in such a nested procedure call.
If that is true, then simply connect once, do all your querying and close the connection. While usage of a connection pool is somewhat orthogonal to your question - use one if you can. They greatly simplify your code because you can have the pool automatically test the connection before it gives one to you. It will auto-create a new connection if the connection was lost (let's say because the DB was bounced).
Finally, you want to minimize the number of times you create a DB Connection BECAUSE it is expensive. However, this is often non-trivial. Databases themselves only support a maximum number of connections. If there are many clients, then you would need to take this into consideration. If you have the trivial case - one database and your program is the only one making connections, then open the connection and leave it open for the duration of the program. This would require you to validate it, so using a DB Pool, size of 1, avoids that.
I'd like to come up with the best approach for re-connecting to MS SQL Server when connection from JBoss AS 5 to DB is lost temporarily.
For Oracle, I found this SO question: "Is there any way to have the JBoss connection pool reconnect to Oracle when connections go bad?" which says it uses an Oracle specific ping routine and utilizes the valid-connection-checker-class-name property described in JBoss' Configuring Datasources Wiki.
What I'd like to avoid is to have another SQL run every time a connection is pulled from the pool which is what the other property check-valid-connection-sql basically does.
So for now, I'm leaning towards an approach which uses exception-sorter-class-name but I'm not sure whether this is the best approach in the case of MS SQL Server.
Hoping to hear your suggestions on the topic. Thanks!
I am not sure it will work the way you describe it (transparently).
The valid connection checker (this can be either a sql statement in the *ds.xml file or a class that does the lifting) is meant to be called when a connection is taken from the pool, as the db could have closed it while it is in the pool. If the connection is no longer valid,
it is closed and a new one is requested from the DB - this is completely transparent to the application and only happens (as you say) when the connection is taken out of the pool. You can then use that in your application for a long time.
The exception sorter is meant to report to the application if e.g. ORA-0815 is a harmless or bad return code for a SQL statement. If it is a harmless one it is basically swallowed, while for a bad one it is reported to the application as an Exception.
So if you want to use the exception sorter to find bad connections in the pool, you need to be prepared that basically every statement that you fire could throw a stale-connection Exception and you would need to close the connection and try to obtain a new one. This means appropriate changes in your code, which you can of course do.
I think firing a cheap sql statement at the DB every now and then to check if a connection from the pool is still valid is a lot less expensive than doing all this checking 'by hand'.
Btw: while there is the generic connection checker sql that works with all databases, some databases provide another way of testing if the connection is good; Oracle has a special ping command for this, which is used in the special OracleConnectionChecker class you refer to. So it may be that there is something similar for MS-SQL, which is less expensive than a simple SQL statement.
I used successfully background validation properties: background-validation-millis from https://community.jboss.org/wiki/ConfigDataSources
With JBoss 5.1 (I don't know with other versions), you can use
<valid-connection-checker-class-name>org.jboss.resource.adapter.jdbc.vendor.MSSQLValidConnectionChecker</valid-connection-checker-class-name>
I'm trying to figure out if it's efficient for me to cache all of my statements when I create my database connection or if I should only create those that are most used and create the others if/when they're needed..
It seems foolish to create all of the statements in all of the client threads. Any feedback would be greatly appreciated.
A bit decent database will already cache them. Just fire Connection#prepareStatement() at the moment you actually need to execute the query. You actually also have no other choice since connection, statement and resultset ought to be acquired and closed in the shortest possible scope, i.e. in a try-finally block in the very same method as you execute the query.
Opening and closing the connection on every query in turn may indeed be expensive. A common solution to that is using a connection pool, for example c3p0.
I think you're worrying too much, prepared statements already benefit from several level of caching:
At the database level: a decent database will reuse the access plan for a given prepared statement.
At the connection pool level: a decent connection pool will cache PreparedStatement objects for each database connection in the pool (and return a cached PreparedStatement on subsequent calls to preparedStatement on a connection).
So actually, I would even say that you might be looking in the wrong direction. The best practice if you want to design a scalable solution is to use a connection pool and to not held a connection longer than needed and to release it (to release database resources) when you're done with it.
This sounds to me like the kind of premature optimization that I wouldn't worry about until I have some information telling me that it mattered. If your database access is inefficient, I'd suspect your schema or access of values before I'd think of caching prepared statements.
I have a Java program consisting of about 15 methods. And, these methods get invoked very frequently during the exeuction of the program. At the moment, I am creating a new connection in every method and invoking statements on them (Database is setup on another machine on the network).
What I would like to know is: Should I create only one connection in the main method and pass it as an argument to all the methods that require a connection object since it would significantly reduce the number of connections object in the program, instead of creating and closing connections very frequently in every method.
I suspect I am not using the resources very efficiently with the current design, and there is a lot of scope for improvement, considering that this program might grow a lot in the future.
Yes, you should consider re-using connections rather than creating a new one each time. The usual procedure is:
make some guess as to how many simultaneous connections your database can sensibly handle (e.g. start with 2 or 3 per CPU on the database machine until you find out that this is too few or too many-- it'll tend to depend on how disk-bound your queries are)
create a pool of this many connections: essentially a class that you can ask for "the next free connection" at the beginning of each method and then "pass back" to the pool at the end of each method
your getFreeConnection() method needs to return a free connection if one is available, else either (1) create a new one, up to the maximum number of connections you've decided to permit, or (2) if the maximum are already created, wait for one to become free
I'd recommend the Semaphore class to manage the connections; I actually have a short article on my web site on managing a resource pool with a Semaphore with an example I think you could adapt to your purpose
A couple of practical considerations:
For optimum performance, you need to be careful not to "hog" a connection while you're not actually using it to run a query. If you take a connection from the pool once and then pass it to various methods, you need to make sure you're not accidentally doing this.
Don't forget to return your connections to the pool! (try/finally is your friend here...)
On many systems, you can't keep connections open 'forever': the O/S will close them after some maximum time. So in your 'return a connection to the pool' method, you'll need to think about 'retiring' connections that have been around for a long time (build in some mechanism for remembering, e.g. by having a wrapper object around an actual JDBC Connection object that you can use to store metrics such as this)
You may want to consider using prepared statements.
Over time, you'll probably need to tweak the connection pool size
You can either pass in the connection or better yet use something like Jakarta Database Connection Pooling.
http://commons.apache.org/dbcp/
You should use a connection pool for that.
That way you could ask for the connection and release it when you are finish with it and return it to the pool
If another thread wants a new connection and that one is in use, a new one could be created. If no other thread is using a connection the same could be re-used.
This way you can leave your app somehow the way it is ( and not passing the connection all around ) and still use the resources properly.
Unfortunately first class ConnectionPools are not very easy to use in standalone applications ( they are the default in application servers ) Probably a microcontainer ( such as Sping ) or a good framework ( such as Hibernate ) could let you use one.
They are no too hard to code one from the scratch though.
:)
This google search will help you to find more about how to use one.
Skim through
Many JDBC drivers do connection pooling for you, so there is little advantage doing additional pooling in this case. I suggest you check the documentation for you JDBC driver.
Another approach to connection pools is to
Have one connection for all database access with synchronised access. This doesn't allow concurrency but is very simple.
Store the connections in a ThreadLocal variable (override initialValue()) This works well if there is a small fixed number of threads.
Otherwise, I would suggest using a connection pool.
If your application is single-threaded, or does all its database operations from a single thread, it's ok to use a single connection. Assuming you don't need multiple connections for any other reason, this would be by far the simplest implementation.
Depending on your driver, it may also be feasible to share a connection between threads - this would be ok too, if you trust your driver not to lie about its thread-safety. See your driver documentation for more info.
Typically the objects below "Connection" cannot safely be used from multiple threads, so it's generally not advisable to share ResultSet, Statement objects etc between threads - by far the best policy is to use them in the same thread which created them; this is normally easy because those objects are not generally kept for too long.