Is it safe or recommended to execute independent statements while you have a result open? Does it matter if they are attached to a different connection or the same one as the result set? I'm particularly concerned with how the result holds locks if any, which could cause deadlock.
Ex.
while(resultSet.next()) {
Execute separate statements in here ( same or different connection )
}
Also is a result set backed by an underlying cursor or something else?
Ty
There are several questions here.
First, generally "yes" it is possible and common to run other SQL statements while iterating over a ResultSet. And yes, ResultSets are backed by a cursor.
It is also possible to create a deadlock doing this, so you just need to be aware of that. If the SQL being executed inside of your loop is not modifying rows in the same table as the ResultSet, then you should ensure that the ResultSet is created with a concurrency mode of CONCUR_READ_ONLY, and in general try to use TYPE_FORWARD_ONLY.
For example:
Statement stmt = con.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
If you use CONCUR_READ_ONLY and TYPE_FORWARD_ONLY in general locks that block writes should not be generated. Using the same Connection object is also recommended because then both the cursor and the SQL that is modifying other objects are within the same transaction and are therefore less likely to cause a deadlock.
Its safe according to my point of view..
The ResultSet is usually linked to the Statement which is usually linked to the Connection.
A ResultSet object is automatically closed when the Statement object
that generated it is closed, re-executed, or used to retrieve the next
result from a sequence of multiple results.
Close ResultSet when finished
Close ResultSet object as soon as you finish
working with ResultSet object even
though Statement object closes the
ResultSet object implicitly when it
closes, closing ResultSet explicitly
gives chance to garbage collector to
recollect memory as early as possible
because ResultSet object may occupy
lot of memory depending on query.
ResultSet.close();
It is perfectly safe. You have two basic choices:
Use a different Connection object.
Use the same Connection object.
The primary difference is in how transactions are handled.
Different Connections are always in different transactions
Different statements in the same Connection can be made to be in the same transaction if you set auto commit mode to false.
Two statements that are not in the same transaction are guaranteed by the SQL server not to interfere with each other (usually this means that the server will hold onto copies of any modified data if an older transaction might still need it). It is also down to the server how it performs locking (if at all) but it must guarantee that no deadlock can occur. Usually it does this by a process called serialization, which involves storing transactions in a log until they can be guaranteed to execute without deadlock.
Related
I have a question about using prepared statements and a connection pool such as c3p0. I have a script running that interacts with my mysql database around once a second, ad infinitum. Each interaction executes a prepared statement. According to the documentation, a connection should be open and closed before and after a database interaction. My understanding is that the Connection object doesn't actually get destroyed, but added back to the pool. Since prepared statements are connection dependent, how do I use Prepared Statements without having to rebuild them every time that I get a connection from the pool - or do I just rebuild the statement after a connection is received by the pool and rely on the pool to do this efficiently via caching?
If your pool implements JDBC transparent Statement caching (as c3p0 does), you just use the ordinary JDBC PreparedStatement API and reuse of cached statements is handled for you.
Internally what happens is that when you call conn.prepareStatement(...) on some Connection, a lookup is performed on an internal hashtable using a key that includes the Connection's identity, the SQL text, and other characteristics of the requested prepared statement. If a suitable PreparedStatement is found, that is what is passed to the client. In none is, then the prepareStatement call gets passed to the Connection, and the returned PreparedStatement is cached for later reuse.
Statement caching itself has some overhead, and can be tricky to configure. Some newer Connection pools, most notably HikariCP, simply omit support, arguing that caching of PreparedStatements is better left to the DBMS. That, of course, is an empirical question, and will vary from DBMS to DBMS. If you do use Statement caching, the crucial point is that you need to allow for
[num_frequently_used_prepared_statments] * [num_connections]
distinct Statements to be cached. This is tricky to reason about, given the JDBC standard global maxStatements config property defines a global limit, even though PreparedStatements are scoped per-connection.
Much better, if you use c3p0, to set only the (nonstandard) maxStatementsPerConnection property. That should be set to at least the number of PreparedStatements frequently used by your application. You don't have to worry about how many Connections will be open, since maxStatementsPerConnection is scoped per Connection like the Statements themselves are.
I hope this helps!
When an external event occurs (incoming measurement data) an event handler in my Java code is being called. The data should be written to a MySQL database. Due to the high frequency of these calls (>1000 per sec) I'd like to handle the inserts efficiently. Unfortunately I'm not a professional developer and an idiot with databases.
Neglecting the efficiency aspect my code would look roughly like this:
public class X {
public void eventHandler(data) {
connection = DriverManager.getConnection()
statement = connection.prepareStatement("insert …")
statement.setString(1, data)
statement.executeUpdate()
statement.close()
connection.close()
}
}
My understanding is that by calling addBatch() and executeBatch() on statement I could limit the physical disk access to let's say every 1000th insert. However as you can see in my code sketch above the statement object is newly instantiated with every call of eventHandler(). Therefore my impression is that the batch mechanism won't be helpful in this context. Same for turning off auto-commit and then calling commit() on the connection object since that one is closed after every insert.
I could turn connection and statement from local variables into class members and reuse them during the whole runtime of the program. But wouldn't it be bad style to keep the database connection open at all time?
A solution would be to buffer the data manually and then write to the database only after collecting a proper batch. But so far I still hope that you smart guys will tell me how to let the database do the buffering for me.
I could turn connection and statement from local variables into class
members and reuse them during the whole runtime of the program. But
wouldn't it be bad style to keep the database connection open at all
time?
Considering that most (database-)connection pools are usually configured to keep at least one or more connections open at all times, I wouldn't call it "bad style". This is to avoid the overhead of starting a new connection on each database operation (unless necessary, if all the already opened connections are in use and the pool allows for more).
I'd probably go with some form of batching in this case (but of course I don't know all your requirements/environment etc). If the data doesn't need to be immediately available somewhere else, you could build some form of a job queue for writing the data, push the incoming data there and let other thread(s) take care of writing it to database in N size batches. Take a look what classes are available in the java.util.concurrent-package.
I would suggest you use a LinkedList<> to buffer the data(like a queue), then store the data into the dbms as and when required in a separate thread, executed at regular intervals(maybe every 2 seconds?)
See how to construct a queue using linkedlist in java
When using JDBC in Java, the generally accepted method of querying a database is to acquire a connection, create a statement from that connection, and then execute a query from that statement.
// load driver
Connection con = DriverManager.getConnection(..);
Statement stmt = con.createStatement();
ResultSet result = stmt.executeQuery("SELECT..");
// ...
However, I am unsure of how to treat a second query to the same database.
Can another query be executed safely on the same Statement object, or must another statement be created from the Connection object in order to execute another query?
If the same Statement object can be used for multiple queries, what is the purpose of the Statement class (since it would then make more sense for a Connection.executeQuery() method to exist)?
Yes you can reuse the Statement object, but the ResultSet objects returned by the executeQuery closes already opened resultsets.
See the javadoc for the explanation
By default, only one ResultSet object per Statement object can be open
at the same time. Therefore, if the reading of one ResultSet object is
interleaved with the reading of another, each must have been generated
by different Statement objects. All execution methods in the Statement
interface implicitly close a statment's current ResultSet object if an
open one exists.
So the following occurs:
// load driver
Connection con = DriverManager.getConnection(..);
Statement stmt = con.createStatement();
ResultSet result = stmt.executeQuery("select ..");
// do something with result ... or not
ResultSet result2 = stmt.executeQuery("select ...");
// result is now closed, you cannot read from it anymore
// do something with result2
stmt.close(); // will close the resultset bound to it
For example you can find an open source implementation of Statement in the jTDS project.
In the Statement.executeQuery() method you can see a call to initialize() that closes all the resultsets already opened
protected void initialize() throws SQLException {
updateCount = -1;
resultQueue.clear();
genKeyResultSet = null;
tds.clearResponseQueue();
// FIXME Should old exceptions found now be thrown instead of lost?
messages.exceptions = null;
messages.clearWarnings();
closeAllResultSets();
}
Programmatically, you can reuse the same connection and the same statement for more than one query and close the statement and the connection at the end.
However, this is not a good practice. Application performance is very sensitive to the way database is accessed. Ideally, each connection should be open for the least amount of time possible. Then, the connections must be pooled. Going by that, you would enclose each query in a block of {open connection, create a prepared statement, run query, close statement, close connection}. This is also the way most SQL Templates are implemented. If concurrency permits, you can fire several such queries at the same time using a thread pool.
I have one thing to add should you use Connection and Statement in a threaded environment.
My experience shows that stmt.executeQuery(..) is save to use in a parallel environment but with the consequence that each query is serialized and thus processed sequencially, not yielding any speed-ups.
So it es better to use a new Connection (not Statement) for every thread.
For a standard sequential environment my experience has shown that reusing Statements is no problem at all and ResultSets need not be closed manually.
I wouldn't worry about creating new statements. However opening up a database connection may be resource intensive and opening and closing connections does impact performance.
Leaving up connections in some self management way usually is pretty bad.
You should consider using connection pooling. You usually issue a close commando however you are only giving that connection back to the pool. When you request a new connection then it will reuse the connection you gave back earlier.
You may want to have different statements for one connection. Statement is an implementation and an interface. Depending on what you need you sometimes want a use a CallableStatment. Some some logic may be reused when required.
Usually, it's one statement for one query. It might not be necessary to do that but when writing real application, you don't want to repeat those same steps again and again. That's against the DRY principal, plus it also will get more complicated as the application grows.
It's good to write objects that will handle that kind of low level (repetitive) stuffs, and provide different methods to access db by providing the queries.
Well that's why we have the concept of classes in object oriented programming . A class defines constituent members which enable its instances to have state and behavior. Here statement deals with everything related to an sql statement. There are so many more function that one might perform like batch queries etc.
We have a java application which we use more then one statement variable. The problem why need more then one statement is that at time while runnning a loop for one result inside the loop we need some other query operation to be done. Most of the places the single stmt is used many times and finally we close. What we would like to confirm now is that we are not closing the resultset variables and we notice the usage of memory fluctuates.So what is the best mechanism to close the resultset immediately after we got the results or towards the end just before stmt is being closed?
According to the JDBC Specification and the Statement.close() API doc it should be sufficient:
Note:When a Statement object is closed, its current ResultSet object, if one exists, is also closed.
Based on that you should be able to assume that you only need to close a statement. However as a Statement can have a longer lifetime than the use of a single ResultSet obtained from it, it is a good idea in general to close the ResultSet as soon as possible.
A Statement can have a longer lifetime because for example you use it again to execute another query, or in case of a PreparedStatement to execute the query again with different parameters. In the case you execute another query, the previously obtained ResultSet will be closed.
Close it when you are finished with it, same as any other resource. Mo point in holding on to it for longer, especially if you are down to measuring memory, which implies a memory concern.
Let's say I have a common method which creates a DB connection:
Connection getConnection() throws SQLException {
Connection con = ... // create the connection
con.setAutoCommit(false);
return con;
}
I put the setAutoCommit(false) call here so that callers of this method never have to worry about setting it. However, is this a bad practice if the operation executed by the caller is only reading data? Is there any extra overhead?
My personal opinion is that it's better to centralize the logic in one place, that way callers never have to set the auto commit and this avoids code redundancy. I just wanted to make sure it didn't incur any unnecessary overhead for a read only operation.
I put the setAutoCommit(false) call here so that callers of this method never have to worry about setting it.
This is fine IMO and I personally believe that one should never ever enable auto-commit mode inside an application. So my recommendation would be to turn off auto-commit.
However, is this a bad practice if the operation executed by the caller is only reading data? Is there any extra overhead?
From a strict performance point of view, it's starting and ending a database transaction for every SQL statement that has an overhead and may decrease the performance of your application.
By the way, SELECT statements are affected by setAutoCommit(boolean) according to the javadoc:
Sets this connection's auto-commit
mode to the given state. If a
connection is in auto-commit mode,
then all its SQL statements will be
executed and committed as individual
transactions. Otherwise, its SQL
statements are grouped into
transactions that are terminated by a
call to either the method commit or
the method rollback. By default, new
connections are in auto-commit mode.
The commit occurs when the statement
completes. The time when the statement
completes depends on the type of SQL
Statement:
For DML statements, such as Insert, Update or Delete, and DDL statements,
the statement is complete as soon as
it has finished executing.
For Select statements, the statement is complete when the associated result
set is closed.
For CallableStatement objects or for statements that return multiple
results, the statement is complete
when all of the associated result sets
have been closed, and all update
counts and output parameters have been
retrieved.
Autocommit doesn't have any value for SELECT queries. But turning autocommit off is indeed a more common practice. More than often you'd like to fire queries in a transaction. Most of the connection pools also by default turns it off. I would however suggest to make it a configuration setting of your connection manager and/or to overload the method taking a boolean argument, so that you at least have any control over it for the case that.
This is an old question, but I wanted to give give a different opinion on the issue.
Performance
The performance overhead from transactions varies with the concurrency control mechanism: normally multi-version concurrency control or locking. The concern expressed in the other answers seems to be down to the cost of ending a transaction, but in my experience the biggest pain is long-running transactions, which can cause performance bottlenecks. For instance, if the DBMS uses locking, some parts of a table cannot be updated until a transaction involving that table has been terminated. More frustrating in systems such as Oracle is that DDL operations (ALTER TABLE, etc.) have to wait until all transactions using that table have ended, leading to troublesome time-outs. So don't think your transaction has no penalty if you're just using SELECTs.
Conventions
A subtle problem with setting the autocommit behaviour off is that you are changing from the default, so anyone else working with your code may not be expecting it. It is really easy to leave an accidental path through a function than does not end with an explicit commit or rollback, and this can lead to unpredictable behaviour in subsequently called functions. Conversely, a large proportion of the DB-interfacing code that I have seen contains a single statement within each function, for which autocommit behaviour is very well suited. In fact a lot of the multi-statement functions I have encountered could have been re-written as single statements with a little more SQL know-how - poor approximations to joins implemented in Java are sadly common.
My personal preference, based on reasonable experience, is as follows for any functions making calls to a database:
keep to the default JDBC behaviour of auto-commit on;
when your function includes more than one SQL statement, use an explicit transaction by setting setAutocommit(false) at the start of each function and calling commit() (or rollback() if appropriate) at the end, and ideally rollback() in the catch block;
enforce the default by putting setAutocommit(true) in the finally block that wraps your JDBC calls in the function (unlike APIs such as PHP/PDO, JDBC won't do this for you after commit()/rollback());
if you're feeling extra defensive, explicitly set your choice of setAutocommit(true) or setAutocommit(false) at the start of every function;
I would never have autoCommit set to true anywhere in the application.
The performance overhead if at all any is nothing compared to the side
effects of a autocommit=true connection.
You say you would never use this connection for DML. But that is the intention, maintained perhaps by coding standards etc. But in practice, it is possible to use this connection for DML statements. This is enough reason for me to never set auto-commit on.
Select statements are definitely going to take some memory/CPU/network. Let the overhead of autocommit be a (very marginal) fixed overhead on every select statement, to make sure data integrity and stability of your application is maintained.