I'm trying to figure out if it's efficient for me to cache all of my statements when I create my database connection or if I should only create those that are most used and create the others if/when they're needed..
It seems foolish to create all of the statements in all of the client threads. Any feedback would be greatly appreciated.
A bit decent database will already cache them. Just fire Connection#prepareStatement() at the moment you actually need to execute the query. You actually also have no other choice since connection, statement and resultset ought to be acquired and closed in the shortest possible scope, i.e. in a try-finally block in the very same method as you execute the query.
Opening and closing the connection on every query in turn may indeed be expensive. A common solution to that is using a connection pool, for example c3p0.
I think you're worrying too much, prepared statements already benefit from several level of caching:
At the database level: a decent database will reuse the access plan for a given prepared statement.
At the connection pool level: a decent connection pool will cache PreparedStatement objects for each database connection in the pool (and return a cached PreparedStatement on subsequent calls to preparedStatement on a connection).
So actually, I would even say that you might be looking in the wrong direction. The best practice if you want to design a scalable solution is to use a connection pool and to not held a connection longer than needed and to release it (to release database resources) when you're done with it.
This sounds to me like the kind of premature optimization that I wouldn't worry about until I have some information telling me that it mattered. If your database access is inefficient, I'd suspect your schema or access of values before I'd think of caching prepared statements.
Related
I have the following scenario:
MethodA() calls MethodB()
MethodB() calls MethodC()
All methods have to execute some query to DB. So, to do this I create a connection object and pass it along the chain of methods to reuse the connection object.
Assumption here is that connection pooling is not being employed.
Now my question is, should only a single connection be opened and reused and be closed at the starting point (in the above example, the connection will be opened and closed in MethodA) ? or should I create a separate connection for each method?
Reusing the connection seems better, but then I will have to keep the connection open till the control comes back to MethodA().
I have read that reusing the connection is better as they are expensive to create. But then I have also read that its better to close the connection as soon as possible, i.e., once you are done with the query call.
Which approach is better and why?
It sounds like you are only querying the DB and not updating or inserting. If that is the case then you avoid many of the transactional semantics in such a nested procedure call.
If that is true, then simply connect once, do all your querying and close the connection. While usage of a connection pool is somewhat orthogonal to your question - use one if you can. They greatly simplify your code because you can have the pool automatically test the connection before it gives one to you. It will auto-create a new connection if the connection was lost (let's say because the DB was bounced).
Finally, you want to minimize the number of times you create a DB Connection BECAUSE it is expensive. However, this is often non-trivial. Databases themselves only support a maximum number of connections. If there are many clients, then you would need to take this into consideration. If you have the trivial case - one database and your program is the only one making connections, then open the connection and leave it open for the duration of the program. This would require you to validate it, so using a DB Pool, size of 1, avoids that.
I have a question about using prepared statements and a connection pool such as c3p0. I have a script running that interacts with my mysql database around once a second, ad infinitum. Each interaction executes a prepared statement. According to the documentation, a connection should be open and closed before and after a database interaction. My understanding is that the Connection object doesn't actually get destroyed, but added back to the pool. Since prepared statements are connection dependent, how do I use Prepared Statements without having to rebuild them every time that I get a connection from the pool - or do I just rebuild the statement after a connection is received by the pool and rely on the pool to do this efficiently via caching?
If your pool implements JDBC transparent Statement caching (as c3p0 does), you just use the ordinary JDBC PreparedStatement API and reuse of cached statements is handled for you.
Internally what happens is that when you call conn.prepareStatement(...) on some Connection, a lookup is performed on an internal hashtable using a key that includes the Connection's identity, the SQL text, and other characteristics of the requested prepared statement. If a suitable PreparedStatement is found, that is what is passed to the client. In none is, then the prepareStatement call gets passed to the Connection, and the returned PreparedStatement is cached for later reuse.
Statement caching itself has some overhead, and can be tricky to configure. Some newer Connection pools, most notably HikariCP, simply omit support, arguing that caching of PreparedStatements is better left to the DBMS. That, of course, is an empirical question, and will vary from DBMS to DBMS. If you do use Statement caching, the crucial point is that you need to allow for
[num_frequently_used_prepared_statments] * [num_connections]
distinct Statements to be cached. This is tricky to reason about, given the JDBC standard global maxStatements config property defines a global limit, even though PreparedStatements are scoped per-connection.
Much better, if you use c3p0, to set only the (nonstandard) maxStatementsPerConnection property. That should be set to at least the number of PreparedStatements frequently used by your application. You don't have to worry about how many Connections will be open, since maxStatementsPerConnection is scoped per Connection like the Statements themselves are.
I hope this helps!
I have an application that processes lots of data in files and puts this data into a database. It has been single threaded; so I create a database connection, create prepared statements on that connection, and then reuse these statements while processing the data. I might process thousands of files and can reuse the same prepared statements over and over but only updating the values. This has been working great, however ...
It has come to the point where it is taking too long to process the files, and since they are all independent, I'd like to process them concurrently. The problem is that each file might use, say, 10 prepared statements. So now for each file I'm making a new database connection (even though they are pooled), setting up these 10 prepared statements, and then closing them and the connection down for each file; so this is happening thousands and thousands of times instead of just a single time before.
I haven't actually done any timings but I'm curious if this use of connections and prepared statements is the best way? Is it really expensive to set up these prepared statements over and over again? Is there a better way to do this? I've read that you don't want to share connections between threads but maybe there's a better solution I haven't thought of?
if this use of connections and prepared statements is the best way? Is it really expensive to set up these prepared statements over and over again?
You can reuse the connections and prepared statements over and over again for sure. You do not have to re-create them and for the connections, you certainly do not have to reconnect to the database server every time. You should be using a database connection pool at the very least. Also, you cannot not use a prepared statement in multiple threads at the same time. And I also think that for most database connections, you cannot use the same connection in different threads.
That said, it might make sense to do some profiler runs because threading database code typically provides minimal speed increase because you are often limited by the database server IO and not by the threads. This may not be true if you are mixing queries and inserts and transactions. You might get some concurrency if you are making a remote connection to a database.
To improve the speed of your database operations, consider turing off auto-commit before a large number of transactions or otherwise batching up your requests if you can.
I advice you to use C3P0 API Check it http://www.mchange.com/projects/c3p0/
Enhanced performance is the purpose of Connection and Statement pooling especially if you are acquiring an unpooled Connection for each client access, this is the major goal of the c3p0 library.
This part is taken from C3P0 Doc about threads and heavy load:
numHelperThreads and maxAdministrativeTaskTime help to configure the behavior of DataSource thread pools. By default, each DataSource has only three associated helper threads. If performance seems to drag under heavy load, or if you observe via JMX or direct inspection of a PooledDataSource, that the number of "pending tasks" is usually greater than zero, try increasing numHelperThreads. maxAdministrativeTaskTime may be useful for users experiencing tasks that hang indefinitely and "APPARENT DEADLOCK" messages.
In addition, I recommend you user Executor and ExecutorService in (java.util.concurrent) to pool your threads.
Look like the following:
Executor executor = Executors.newFixedThreadPool(int numberOfThreadsNeeded);
// Executor executor =Executors.newCachedThreadPool(); // Or this one
executor.execute(runnable);
.
.
.
etc
I'd like to come up with the best approach for re-connecting to MS SQL Server when connection from JBoss AS 5 to DB is lost temporarily.
For Oracle, I found this SO question: "Is there any way to have the JBoss connection pool reconnect to Oracle when connections go bad?" which says it uses an Oracle specific ping routine and utilizes the valid-connection-checker-class-name property described in JBoss' Configuring Datasources Wiki.
What I'd like to avoid is to have another SQL run every time a connection is pulled from the pool which is what the other property check-valid-connection-sql basically does.
So for now, I'm leaning towards an approach which uses exception-sorter-class-name but I'm not sure whether this is the best approach in the case of MS SQL Server.
Hoping to hear your suggestions on the topic. Thanks!
I am not sure it will work the way you describe it (transparently).
The valid connection checker (this can be either a sql statement in the *ds.xml file or a class that does the lifting) is meant to be called when a connection is taken from the pool, as the db could have closed it while it is in the pool. If the connection is no longer valid,
it is closed and a new one is requested from the DB - this is completely transparent to the application and only happens (as you say) when the connection is taken out of the pool. You can then use that in your application for a long time.
The exception sorter is meant to report to the application if e.g. ORA-0815 is a harmless or bad return code for a SQL statement. If it is a harmless one it is basically swallowed, while for a bad one it is reported to the application as an Exception.
So if you want to use the exception sorter to find bad connections in the pool, you need to be prepared that basically every statement that you fire could throw a stale-connection Exception and you would need to close the connection and try to obtain a new one. This means appropriate changes in your code, which you can of course do.
I think firing a cheap sql statement at the DB every now and then to check if a connection from the pool is still valid is a lot less expensive than doing all this checking 'by hand'.
Btw: while there is the generic connection checker sql that works with all databases, some databases provide another way of testing if the connection is good; Oracle has a special ping command for this, which is used in the special OracleConnectionChecker class you refer to. So it may be that there is something similar for MS-SQL, which is less expensive than a simple SQL statement.
I used successfully background validation properties: background-validation-millis from https://community.jboss.org/wiki/ConfigDataSources
With JBoss 5.1 (I don't know with other versions), you can use
<valid-connection-checker-class-name>org.jboss.resource.adapter.jdbc.vendor.MSSQLValidConnectionChecker</valid-connection-checker-class-name>
What is a Connection Object in JDBC ? How is this Connection maintained(I mean is it a Network connection) ? Are they TCP/IP Connections ? Why is it a costly operation to create a Connection every time ? Why do these connections become stale after sometime and I need to refresh the Pool ? Why can't I use one connection to execute multiple queries ?
These connections are TCP/IP connections. To not have to overhead of creating every time a new connection there are connection pools that expand and shrink dynamically. You can use one connection for multiple queries. I think you mean that you release it to the pool. If you do that you might get back the same connection from the pool. In this case it just doesn't matter if you do one or multiple queries
The cost of a connection is to connect which takes some time. ANd the database prepares some stuff like sessions, etc for every connection. That would have to be done every time. Connections become stale through multiple reasons. The most prominent is a firewall in between. Connection problems could lead to connection resetting or there could be simple timeouts
To add to the other answers:
Yes, you can reuse the same connection for multiple queries. This is even advisable, as creating a new connection is quite expensive.
You can even execute multiple queries concurrently. You just have to use a new java.sql.Statement/PreparedStatement instance for every query. Statements are what JDBC uses to keep track of ongoing queries, so each parallel query needs its own Statement. You can and should reuse Statements for consecutive queries, though.
The answers to your questions is that they are implementation defined. A JDBC connection is an interface that exposes methods. What happens behind the scenes can be anything that delivers the interface. For example, consider the Oracle internal JDBC driver, used for supporting java stored procedures. Simultaneous queries are not only possible on that, they are more or less inevitable, since each request for a new connection returns the one and only connection object. I don't know for sure whether it uses TCP/IP internally but I doubt it.
So you should not assume implementation details, without being clear about precisely which JDBC implementation you are using.
since I cannot comment yet, wil post answer just to comment on Vinegar's answer, situation with setAutoCommit() returning to default state upon returning connection to pool is not mandatory behaviour and should not be taken for granted, also as closing of statements and resultsets; you can read that it should be closed, but if you do not close them, they will be automatically closed with closing of connection. Don't take it for granted, since it will take up on your resources on some versions of jdbc drivers.
We had serious problem on DB2 database on AS400, guys needing transactional isolation were calling connection.setAutoCommit(false) and after finishing job they returned such connection to pool (JNDI) without connection.setAutoCommit(old_state), so when another thread got this connection from pool, inserts and updates have not commited, and nobody could figure out why for a long time...