If I have a Connection con open and a Statment stat what does the stat.executeBatch()?
I think it does nothing because I have set autocommit to false.
Here an example:
stat.addBatch("update bankaccount set balance = balance + 100 where customer = 'Bill'");
stat.addBatch("update bankaccount set balance = balance - 100 where customer = 'Joe'");
stat.executeBatch();
con.commit();
Batch Processing allows you to group related SQL statements into a batch and submit them with one call to the database.
When you send several SQL statements to the database at once, you reduce the amount of communication overhead, thereby improving performance.
The addBatch() method of Statement, PreparedStatement, and CallableStatement is used to add individual statements to the batch.
The executeBatch() is used to start the execution of all the statements grouped together.
Also,
To enable manual- transaction support instead of the auto-commit mode that the JDBC driver uses by default, use the Connection object's setAutoCommit() method.
If you pass a boolean false to setAutoCommit( ), you turn off auto-commit. You can pass a boolean true to turn it back on again.
Related
TL;DR
I have a Spring Boot application that makes use of parquet files stored on the file system. To access them we are using Apache Drill.
Since I have multiple users that might access them, I've set up a connection pool in Spring.
When I'm using the connection pool, Drill somehow executes a "limit 0" query before executing my actual query, and this affect performances. The same "limit 0" query is NOT executed when I run my queries through a simple Statement obtained from direct Connection.
This seems to be related to the fact that Spring JdbcTemplate makes use of PreparedStatements instead of simple Statements.
Is there a way to get rid of those "limit 0" queries?
-- Details --
The connection pool in the Spring configuration class looks like this:
#Bean
#ConfigurationProperties(prefix = "datasource.parquet")
#Qualifier("parquetDataSource")
public DataSource parquetDataSource() {
return DataSourceBuilder.create().build();
}
And the corresponding properties in the development profile YML file are:
datasource:
parquet:
url: jdbc:drill:drillbit=localhost:31010
jdbcUrl: jdbc:drill:drillbit=localhost:31010
jndiName: jdbc/app_parquet
driverClassName: org.apache.drill.jdbc.Driver
maximumPoolSize: 5
initialSize: 1
maxIdle: 10
maxActive: 20
validation-query: SELECT 1 FROM sys.version
test-on-borrow: true
When I execute a query using the JdbcTemplate created with the mentioned Drill DataSource, 3 different queries might be executed:
the validation query SELECT 1 FROM sys.version;
a "limit 0" query that looks like SELECT * FROM (<my actual query>) LIMIT 0;
my actual query.
Here's the execution code (parquetJdbcTemplate is an instance of a class that extends org.springframework.jdbc.core.JdbcTemplate):
parquetJdbcTemplate.query(sqlQuery, namedParameters,
resultSet -> {
MyResultSet result = new MyResultSet();
while (resultSet.next()) {
// populate the "result" object
}
return result;
});
Here's a screenshot from the Profile page of my Drill monitor:
The bottom query is the "limit 0" one, then in the middle you have the validation query and on top (even if the query is not shown) the actual query that returns the data I want.
As you can see, the "limit 0" query takes more than 1/3 of the entire execution time to run. The validation query is fine, since the execution time is negligible and it's needed to check the connection.
The fact is, when I execute the same query using a Connection through the Drill driver (thus, with no pool), I only see my actual query in the UI monitor:
public void executeQuery(String myQuery) {
Class.forName("org.apache.drill.jdbc.Driver");
Driver.load();
Connection connection = DriverManager.getConnection("jdbc:drill:drillbit=localhost:31010");
Statement st = connection.createStatement();
ResultSet resultSet = st.executeQuery(myQuery);
while (resultSet.next()) {
// do stuff
}
}
As you can see, the total execution time improves by a lot (~14 seconds instead of ~26), just because the "limit 0" query is not executed.
As far as I know, those "limit 0" queries are executed to validate and get information about the underlying schema of the parquet files. Is there a way to disable them while using the connection pool? I ideally would like to still use PreparedStatements over simple Statements, but I could switch to simple Statements if needed, because I have full control over those queries (so, no SQL injection should be possible unless someone hacks the deployed artifacts).
You are right Drill executes limit 0 prior prepared statements to get information about schema. I don't think there is a way to disable such behavior. Though I can recommend to enable planner.enable_limit0_optimization option which is false by default, this may speed limit 0 query execution. Another way to speed limit 0 queries is to indicate schema explicitly using casts through the view usage or directly in queries.
Regarding not showing query, I think this was fixed in the latest Drill version.
question background:
1.database is neo4j 2.3.1, driver using jdbc;
2.db connection initialized as a class member, default is auto-commit(not changed);
To avoid insert duplicates, i query before insert. after program stopped, found duplicates. why?
code:
String query = "CREATE (n:LABEL {name:'jack'})";
System.out.println(query);
Statement stmt = dbConnection.createStatement();
stmt.executeUpdate(query);
stmt.close();
Use MERGE + unique constraints instead
How do you "check"
You would have to check in the same tx and also take a write lock
after debugging i found that for neo4j-jdbc(v2.1.4), the default db connection transaction level is TRANSACTION_NONE, then i set it to TRANSACTION_READ_COMMITTED, above issue disappeared. so i think that TRANSACTION_READ_COMMITTED will force the previous insert committed, though this is not the recommended way. for isolation level refer to:Difference between read commit and repeatable read
I have to check the code of a fellow coworker and I stumble on this piece of code:
private void pdate(JdbcTemplate jdbcTemplate, List<Long> saisineIdsToUpdate,Connection connection) throws SQLException {
String sqlUpdate = "UPDATE SAISINES SAI WHERE SAI.IDSAISINE = ?"; //request simplified
PreparedStatement psUpdate = connection.prepareStatement(sqlUpdate);
for (Long saisineId : saisineIdsToUpdate) {
psUpdate.setLong(1, saisineId );
psUpdate.addBatch();
}
psUpdate.executeBatch();
psUpdate.close();
The code works, the updates are done correctly, but I cannot find the trace of a connection.commit();
I wonder how it can work without the commit - could someone explain why ?
As explained here, JDBC-drivers commonly use autocommit, you can enable database-traces via DBMS-driver specific settings like showSQL or generateDDL in JPA.
To enable manual- transaction support instead of the auto-commit mode
that the JDBC driver uses by default, use the Connection object's
setAutoCommit() method. If you pass a boolean false to setAutoCommit(
), you turn off auto-commit. You can pass a boolean true to turn it
back on again.
if you set auto-commit on your connection object to false then we have to commit the transaction manually
connection.setAutoCommit(false);
// your code goes here
connection.commit();
if you don't set auto-commit then default its value is true and it will commit each record
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
I want some advice on some concurrency issues regarding jdbc, i basically need to update a value and then retrieve that value using a update then a select, I'm assuming by turning auto commit off no other transaction can access this table, hence other transactions won't be able to perform update and select queries until this has been committed.
Below is some example code. Do you think this will work and does any one else have a better solution to implementing this?
int newVal=-1;
con.setAutoCommit(false);
PreparedStatement statement = con.prepareStatement("UPDATE atable SET val=val+1 WHERE id=?");
statement.setInt(1, id);
int result = statement.executeUpdate();
if (result != 1) {
throw new SQLException("Nothing updated");
} else {
statement = con.prepareStatement("SELECT val FROM atable WHERE id=?");
statement.setInt(1, id);
ResultSet resultSet = statement.executeQuery();
if (resultSet.next()) {
newVal = resultSet.getInt("val");
}
}
statement.close();
con.commit();
con.setAutoCommit(true);
Thanks.
Assuming you use some form of data source, you may configure there if you want transactionality and the isolation level. But to be explicit:
try(Connection con = ds.getConnection()){
con.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
con.setAutoCommit(false);
//...
} catch(SQLException sqle) {
throw new MyModelException(e)
}
Now, you could trigger pesimistic locking by updating a version (or timestamp) field in your table. This will trigger a lock in the database (most likely at the record level):
try(PreparedStatement pStm = con.prepareStatement("update atable set version=version+1")){
pStm.executeUpdate();
}
At this point, if another user is trying to update the same record simultaneously, this connection will either wait or timeout, so you must be ready for both things. The record will not be unlocked until your transaction ends (commit or rollback).
Then, you can safely select and update whatever you want and be sure that nobody else is touching your record as you process your data. If anybody else tries they will be put on wait until you finish (or they will timeout depending on connection configuration).
Alternatively you could use optimistic locking. In this case you read your record, do modifications to it, but in the update you make sure nobody else has changed it since you read it by checking that the version/timestamp field is the same as the one you orginally read. In this case you must be prepared to retry a transaction (or abort it alltogether) if you realize you have stale/outdated data.
i.e. update atable set afield=? where id=? and version=1
If the number of rows affected is 0, then you know that is probable that the record was updated between your read and your update and the record is no longer in version 1.
Setting autocommit=false on your connection will not prevent other connections/threads from changing the row in the database! It will only disable automatic commits after each JDBC operation on that specific connection.
You will need to lock the row, eg. with select ... for update to prevent other transactions against the row, and also you will need to do your selects and updates within a single transaction.
Cheers,