Should autocommit of a datasource be set to false? - java

please see the comments in spring DataSourceTransactionManager.java, function doBegin:
// Switch to manual commit if necessary. This is very expensive in some JDBC drivers,
// so we don't want to do it unnecessarily (for example if we've explicitly
// configured the connection pool to set it already).
if (con.getAutoCommit()) {
txObject.setMustRestoreAutoCommit(true);
if (logger.isDebugEnabled()) {
logger.debug("Switching JDBC Connection [" + con + "] to manual commit");
}
con.setAutoCommit(false);
}
In the project I'm working on, autocommit is not configured. So it is true by default. We are using Spring to manage transactions, and all SQLs are executed within #Transactional annotated functions. So Transactions are acturally manually committed. Everytime a transaction begin, the db connection is set autocommit to false, and after the transaction exit autocommit is set back to true. A typical workflow would be (at JDBC level):
conn = dataSource.getConnection();
conn.setAutoCommit(false);
stmt = conn.createStatement();
stmt.executeQuery(...);
conn.commit()/ conn.rollback();
conn.setAutoCommit(true);
Is setting autocommit back and forth expensive? should we config datasource connection pool autocommit=false for performance reason? to skip the step 2 and step 6.

1) autocommit is totally dependent on the database, what it means is that, each and every statement through the connection will be executed in a separate transaction which is implicitly carried out. Unless and until, you want to use personal coding and avoid those locks being held by multiple statements which could lead to conflicts with other users, there is no need to set the autocommit to false.
2)From performance points of view,
a) if you have a lot of users and there is some conflict that is occuring because of holding of database locks,
then, there may be a need to check into the issues pertaining to it but as a general
rule, autocommit was introduced for simplification of stuff for beginners.
b) there may be instances where you need to rollback .
c) you want to commit the transaction manually based on a specific condition.
EDIT: I see you have edited the question, to answer you simply, autocommit=false will force you to write your own commit/rollback/etc, performance is totally dependent on the database, the number of locks held at a moment in real time!!
No. setting autocommit to false and true again will not increase the toll on the system.
NO, do not config datasource connection pool autocommit=false unless you are doing it for some specific reason and are an experienced person. From performance points of view, as i already decalred, it is dependent on the type of database and real time users accessing the database at an instance, for your project, 99.99 percent you wouldn't be needing to set it to false.
setting autocommit to true will just ensure that commit is called after each and every statement.
I also see that you are getting your connection from a datasource, in such cases, it is best to leave the connection with default settings so that the next time that connection is fetched from the pool, there wouldn't be any trouble with the workflow
Hope this helped!!

In bulk operations you can set it off in your session and set again on after bulk operation completion to gain performance.
SET autocommit=0;
your code here....
SET autocommit=1;
Update:
As #codemania explained very well that even there is option to disable autocommit as per your requirement but you should not do this. Even this is basic need of transaction to either successfully commit a set of instructions or rollback, if you do it disable then how you will achieve it.
This will be useful if you are doing some bulky task like data migration etc, for it you can disable autocommit to gain performance but only in that session.

You should set autocommit to true whenever you're issuing database transactions. A database trasaction is a logical unit of work, which usually consists of multiple database operations (usually multiple updates) and you want either all of them to succeed or all of them to fail. With autocommit=false, your changes will not be made persistant until you call commit() on the connection object, so this approach guarantees that all your updates will either succeed or fail (by calling rollback in case of an exception etc).
When autocommit is set to true (the default), you can, for instance, change one table but then on the second update (whether it be an update/insert or delete), an exception may occur and your second table doesn't get updated, thus leaving your database in an inconsistent state.
In conclusion, autocommit=true is OK when just reading data or when the database data model is simple and accessed by few users (so that concurrent access to the same regions of data is very rare and when some database inconsistencies can even be tolerated)

Related

Java SQLite: Is it necessary to call rollback when auto commit is false and a transaction fails?

When working with a SQLite database in Java, suppose I set auto commit to false. Is it necessary when a SQLException occurs that I call rollback() method? Or can I simply ignore calling it and the transaction will automatically be rolled back (all the changes I made during the transaction will be undone automatically)?
Quick answer: The fact that you're asking means you're doing it wrong, probably. However, if you must know: Yes, you need to explicitly rollback.
What is happening under the hood
At the JDBC level (and if you're using JOOQ, JDBI, Hibernate, or something similar, that's a library built on top of JDBC usually), you have a Connection instance. You'd have gotten this via DriverManager.getConnection(...) - or a connection pooler got it for you, but something did.
That connection can be in the middle of a transaction (auto-commit mode merely means that the connection assumes you meant to write an additional commit() after every SQL statement you care to run in that connection's context, that's all auto-commit does, but, obviously, if that's on, you probably are in a 'clean' state, that is, the last command processed by that connection was either COMMIT or ROLLBACK).
If it is in the middle of a transaction and you close the connection, the ROLLBACK is implicit.
The connection has to make a choice, it can't keep existing, so, it commits or rolls back. The spec guarantees it doesn't just commit for funsies on you, so, therefore, it rolls back.
The question then boils down to your specific setup. This, specifically, is dangerous:
try (Connection con = ...) {
con.setAutoCommit(false);
try {
try (var s = con.createStatement()) {
s.execute("DROP TABLE foobar");
}
} catch (SQLException ignore) {
// ignoring an exception usually bad idea. But for sake of example..
}
// A second statement on the same connection...
try (var s = con.createStatement()) {
s.execute("DROP TABLE quux");
}
}
A JDBC driver is, as far as the spec is concerned, free to throw an SQLException along the lines of 'the connection is aborted; you must explicitly rollback first then you can use it again' on the second statement.
However, the above code is quite bad. You cannot use transaction isolation level SERIALIZABLE at all with this kind of code (once you get more than a handful of users, the app will crash and burn in a cavalcade of retry exceptions), and it is either doing something useless (re-using 1 connection for multiple transactions when you have a connection pooler in use), or is solving a problem badly (the problem of: Using a new connection for every transaction is pricey).
1 transaction, 1 connection
The only reason the above was dangerous is because we're doing two unrelated things (namely: 2 transactions) in a single try-block associated with a connection object. We're re-using the connection. This is a bad idea: connections have baggage associated with them: Properties that were set, and, yes, being in 'abort' state (where an explicit ROLLBACK is required before the connection is willing to execute any other SQL). By just closing the connection and getting a new one, you ditch all that baggage. This is the kind of baggage that results in bugs that unit tests are not going to catch easily, a.k.a. bugs that, if they ever trigger, cost a ton of money / eyeballs / goodwill / time to fix. Objectively you must prefer 99 easy-to-catch bugs if it avoids a single 100x-harder-to-catch bug, and this is one of those bugs that falls in the latter category.
Connections are pricey? What?
There's one problem with that: Just use a connection for a single transaction and then hand it back, which thus eliminates the need to rollback, as the connection will do that automatically if you close() it: Getting connections is quite resource-heavy.
So, folks tend to / should probably be using a connection pooler to avoid this cost. Don't write your own here either; use HikariCP or something like it. These tools pool connections for you: Instead of invoking DriverManager.getConnection, you ask HikariCP for one, and you hand your connection back to HikariCP when you're done with it. Hikari will take care of resetting it for you, which includes rolling back if the connection is halfway inside a transaction, and tackling any other per-connection settings, getting it back to known state.
The common DB interaction model is essentially this 'flow':
someDbAccessorObject.act(db -> {
// do a single transaction here
});
and that's it. This code, under the hood, does all sorts of things:
Uses a connection pooler.
Sets up the connection in the right fashion, which primarily involves setting auto-commit to false, and setting the right transaction isolation level.
will COMMIT at the end of the lambda block, if no exceptions occurred. Hands back the connection in either case, back to the pool.
Will catch SQLExceptions and analyse if they are retry exceptions. If yes, does nagle's algorithm or some other randomized exponential backoff and reruns the lambda block (that's what retry exceptions mean).
Takes care of having the code that 'gets' a connection (e.g. determines the right JDBC url to use) in a single place, so that a change in db config does not entail going on a global search/replace spree in your codebase.
In that model, it is somewhat rare that you run into your problem, because you end up in a '1 transaction? 1 connection!' model. Ordinarily that's pricey (creating connections is far more expensive that rolling back/committing as usual and then just continuing with a new transaction on the same connection object), but it boils down to the same thing once a pooler is being used.
In other words: Properly written DB code should not have your problem unless you're writing a connection pooler yourself, in which case the answer is definitely: roll back explicitly.

Prepared Statement lifetime and c3p0

I have a question about using prepared statements and a connection pool such as c3p0. I have a script running that interacts with my mysql database around once a second, ad infinitum. Each interaction executes a prepared statement. According to the documentation, a connection should be open and closed before and after a database interaction. My understanding is that the Connection object doesn't actually get destroyed, but added back to the pool. Since prepared statements are connection dependent, how do I use Prepared Statements without having to rebuild them every time that I get a connection from the pool - or do I just rebuild the statement after a connection is received by the pool and rely on the pool to do this efficiently via caching?
If your pool implements JDBC transparent Statement caching (as c3p0 does), you just use the ordinary JDBC PreparedStatement API and reuse of cached statements is handled for you.
Internally what happens is that when you call conn.prepareStatement(...) on some Connection, a lookup is performed on an internal hashtable using a key that includes the Connection's identity, the SQL text, and other characteristics of the requested prepared statement. If a suitable PreparedStatement is found, that is what is passed to the client. In none is, then the prepareStatement call gets passed to the Connection, and the returned PreparedStatement is cached for later reuse.
Statement caching itself has some overhead, and can be tricky to configure. Some newer Connection pools, most notably HikariCP, simply omit support, arguing that caching of PreparedStatements is better left to the DBMS. That, of course, is an empirical question, and will vary from DBMS to DBMS. If you do use Statement caching, the crucial point is that you need to allow for
[num_frequently_used_prepared_statments] * [num_connections]
distinct Statements to be cached. This is tricky to reason about, given the JDBC standard global maxStatements config property defines a global limit, even though PreparedStatements are scoped per-connection.
Much better, if you use c3p0, to set only the (nonstandard) maxStatementsPerConnection property. That should be set to at least the number of PreparedStatements frequently used by your application. You don't have to worry about how many Connections will be open, since maxStatementsPerConnection is scoped per Connection like the Statements themselves are.
I hope this helps!

Is HikariCP autocommit usage the same as regular java connection autocommit usage?

I recently uses HikariCP. Before I use my own simple ConnectionPool to meet our needs. In our software sometimes we need to perform multiple database inserts where each inserts depends on some validation. More or less like the sample from this site: http://docs.oracle.com/javase/tutorial/jdbc/basics/transactions.html#commit_transactions
In my old way, when I was using my own conn pool, I will always set the connection object to setAutoCommit(false) before giving it to the requesting object so the database manager can rollback the data manually when something goes wrong. Like in the sample, if the try catches any exceptions, then it will call the rollback function. When I return the connection, I call the connection.commit() in the connection return and set autocommit back to true in the connection pool manager.
My question: does HikariCP still uses the same procedure for my needs? Meaning, set the autocommit to false (I read the manual, you have autocommit parameters for your config), and then we just manually rollback or commit the transaction then return to the pool? Or is there some automation done where we can just throw an exception and HikariCP will automatically call the rollback upon error or call commit upon connection return if I do not set the config param for Autocommit = false?
Thank you for any info.
Rendra
HikariCP auto-commit behavior is the same as without a pool. If autoCommit=false, you are responsible for commit/rollback in a try-finally. So, yes, you just commit/rollback and then return the connection to the pool.
The truth is that if autoCommit=false, and you run queries without committing, then HikariCP will automatically rollback on return to the pool. However, this is for safety and I discourage you from coding based on this behavior. Doing so will make your code less portable if you ever choose to switch pools.

setAutocommit(true) further explanations

I have come across this oracle java tutorial. As a beginner in the topic I cannot grasp why it's needed to set con.setAutocommit(true); at the end of the transaction.
Here is the oracle explanation:
The statement con.setAutoCommit(true); enables auto-commit mode, which
means that each statement is once again committed automatically when
it is completed. Then, you are back to the default state where you do
not have to call the method commit yourself. It is advisable to
disable the auto-commit mode only during the transaction mode. This
way, you avoid holding database locks for multiple statements, which
increases the likelihood of conflicts with other users.
Could you explain it in other words? especially this bit:
This way, you avoid holding database locks for multiple statements,
which increases the likelihood of conflicts with other users.
What do they mean with "holding database locks for multiple statements"?
Thanks in advance.
The database has to perform row-level or table-level locking (based on your database-engine in MySQL) to handle transactions. If you keep the auto-commit mode off and keep executing statements, these locks won't be released until you commit the transactions. Based on the type, other transactions won't be able to update the row/table that is currently locked. setAutocommit(true) basically commits the current transaction, releases the locks currently held, and enables auto-commit, That is, until further required, each individual statement is executed and commited.
row-level locks protect the individual rows that take part in the transaction (InnoDB). Table-level locks prevent concurrent access to the entire table (MyIsam).
When one transaction updates a row in the database others transaction cannot alter this row until the first one finishes (commits or rollbacks), therefore if you do not need transactions it is advisable to set con.setAutocommit(true).
With most modern database systems you can batch together a series of SQL statements. Typically the ones you care about are inserts as these will block out a portion of the space on disk that is being written to. In JDBC this is akin to Statement.addBatch(sql). Now where this becomes problematic is when you try to implement pessimistic or optimistic locks on tuples in the database. So if you have a series of long running transactions that execute multiple batches you can find yourself in a situation where all reads get rejected because of these exclusive locks. I believe in Oracle there is no such thing as the dirty read so this can potentially be mitigated. But imagine the scenario where you are running a job that attempts to delete a record while I am updating it, this is the type of conflict that they are referring to.
With auto-commit on, each part of the batch is saved before moving on to the next unit of work. This is what you see when trying to persist millions of records and it slows down considerably. Because the system is ensuring consistency with each insert statement. There is a quick way to get around this in Oracle (if you are using oracle) is to use the oracle.sql package and look at the ARRAY class.
Most databases will autoCommit by default. That means that as soon as you execute a statement the results will immediately appear in the database and everyone else using the database will immediately see them.
There are times, however, when you need to perform a number of changes on the database which must all be done at once and if one fails you want to back out of all of them.
Say you have a cars database and you come across a new car from a new manufacturer. Here you may wish to create the manufacturer entry in your database and the new car record and make sure they both appear at once for other users. Otherwise there may be a confusing moment in your database where one exists without the other.
To achieve this you switch autoCommit off, execute the statements, commit them and then set autoCommit back on. This last switch on of autoCommit is probably what you are seeing.

JDBC - setAutoCommit for read only operation

Let's say I have a common method which creates a DB connection:
Connection getConnection() throws SQLException {
Connection con = ... // create the connection
con.setAutoCommit(false);
return con;
}
I put the setAutoCommit(false) call here so that callers of this method never have to worry about setting it. However, is this a bad practice if the operation executed by the caller is only reading data? Is there any extra overhead?
My personal opinion is that it's better to centralize the logic in one place, that way callers never have to set the auto commit and this avoids code redundancy. I just wanted to make sure it didn't incur any unnecessary overhead for a read only operation.
I put the setAutoCommit(false) call here so that callers of this method never have to worry about setting it.
This is fine IMO and I personally believe that one should never ever enable auto-commit mode inside an application. So my recommendation would be to turn off auto-commit.
However, is this a bad practice if the operation executed by the caller is only reading data? Is there any extra overhead?
From a strict performance point of view, it's starting and ending a database transaction for every SQL statement that has an overhead and may decrease the performance of your application.
By the way, SELECT statements are affected by setAutoCommit(boolean) according to the javadoc:
Sets this connection's auto-commit
mode to the given state. If a
connection is in auto-commit mode,
then all its SQL statements will be
executed and committed as individual
transactions. Otherwise, its SQL
statements are grouped into
transactions that are terminated by a
call to either the method commit or
the method rollback. By default, new
connections are in auto-commit mode.
The commit occurs when the statement
completes. The time when the statement
completes depends on the type of SQL
Statement:
For DML statements, such as Insert, Update or Delete, and DDL statements,
the statement is complete as soon as
it has finished executing.
For Select statements, the statement is complete when the associated result
set is closed.
For CallableStatement objects or for statements that return multiple
results, the statement is complete
when all of the associated result sets
have been closed, and all update
counts and output parameters have been
retrieved.
Autocommit doesn't have any value for SELECT queries. But turning autocommit off is indeed a more common practice. More than often you'd like to fire queries in a transaction. Most of the connection pools also by default turns it off. I would however suggest to make it a configuration setting of your connection manager and/or to overload the method taking a boolean argument, so that you at least have any control over it for the case that.
This is an old question, but I wanted to give give a different opinion on the issue.
Performance
The performance overhead from transactions varies with the concurrency control mechanism: normally multi-version concurrency control or locking. The concern expressed in the other answers seems to be down to the cost of ending a transaction, but in my experience the biggest pain is long-running transactions, which can cause performance bottlenecks. For instance, if the DBMS uses locking, some parts of a table cannot be updated until a transaction involving that table has been terminated. More frustrating in systems such as Oracle is that DDL operations (ALTER TABLE, etc.) have to wait until all transactions using that table have ended, leading to troublesome time-outs. So don't think your transaction has no penalty if you're just using SELECTs.
Conventions
A subtle problem with setting the autocommit behaviour off is that you are changing from the default, so anyone else working with your code may not be expecting it. It is really easy to leave an accidental path through a function than does not end with an explicit commit or rollback, and this can lead to unpredictable behaviour in subsequently called functions. Conversely, a large proportion of the DB-interfacing code that I have seen contains a single statement within each function, for which autocommit behaviour is very well suited. In fact a lot of the multi-statement functions I have encountered could have been re-written as single statements with a little more SQL know-how - poor approximations to joins implemented in Java are sadly common.
My personal preference, based on reasonable experience, is as follows for any functions making calls to a database:
keep to the default JDBC behaviour of auto-commit on;
when your function includes more than one SQL statement, use an explicit transaction by setting setAutocommit(false) at the start of each function and calling commit() (or rollback() if appropriate) at the end, and ideally rollback() in the catch block;
enforce the default by putting setAutocommit(true) in the finally block that wraps your JDBC calls in the function (unlike APIs such as PHP/PDO, JDBC won't do this for you after commit()/rollback());
if you're feeling extra defensive, explicitly set your choice of setAutocommit(true) or setAutocommit(false) at the start of every function;
I would never have autoCommit set to true anywhere in the application.
The performance overhead if at all any is nothing compared to the side
effects of a autocommit=true connection.
You say you would never use this connection for DML. But that is the intention, maintained perhaps by coding standards etc. But in practice, it is possible to use this connection for DML statements. This is enough reason for me to never set auto-commit on.
Select statements are definitely going to take some memory/CPU/network. Let the overhead of autocommit be a (very marginal) fixed overhead on every select statement, to make sure data integrity and stability of your application is maintained.

Categories