I am very confused with the output of the following code that tries to avoid Hibernate caching.
I open a fresh Hibernate session, run a query, and check the result when it stops at the indicated breakpoint. Before continuing execution, I go to MySQL and delete or add a row. When I continue executing, the query still shows old data and old row count, inspite of the evictAllRegions() call on the hibernate cache, while the plain JDBC query shows the updated count (as expected).
Setting hibernate.cache.use_second_level_cache and hibernate.cache.use_query_cache to false didn't help. I guess it shouldn't matter as the cache is being cleared manually.
So, why is Hibernate not hitting the database?
Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/mydb...");
for (int i = 0; i < 15; i++) {
session = HibernateUtil.getSessionFactory().openSession();
// Old data keeps being returned
list = session.createCriteria(Language.class).list();
// JDBC fetches expected count
Statement statement = conn.createStatement();
ResultSet resultSet = statement.executeQuery("select * from language");
int x = 0;
while (resultSet.next()) x++; // count the rows
[Breakpoint here]
session.close();
HibernateUtil.getSessionFactory().getCache().evictAllRegions();
}
I believe this is a result of having the transaction isolation level set to REAPEATABLE-READ in MySQL.
When you issue the query from your code, MySQL creates a snapshot of the language table that it continues to present for the remainder of that transaction. So the data is effective cached at MySQL rather than at Hibernate.
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html#isolevel_repeatable-read
Related
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
I am trying to find a way how to force jaybird to do "pagination" in ResultSet. Suppose we have some long lasting sql query (it returns for example 5000 rows in 15 seconds). However, fetching of first 50 (random) rows takes just a fraction of second. So long as we do not add order by clause to the query, the server quickly returns first rows, which can be immediately shown in the client application. This is btw the default behaviour of flamerobin client.
I try to simulate this behaviour with setting the Statement parameters like in the code below, but without success. Is there a way to force jaybird not to load all rows to the ResultSet? I suppose that the method stmt.setFetchSize(50) has this purpose but it is perhaps wrong. Jaybird version used was 2.2.7 and Firebird version used was 2.5.4. Thank you for your advices.
String user = "user";
String pass = "pass";
Connection conn = DriverManager.getConnection(s, user, pass);
conn.setTransactionIsolation(Connection.TRANSACTION_REPEATABLE_READ);
conn.setAutoCommit(false);
Statement stmt = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(50);
stmt.setFetchDirection(ResultSet.FETCH_FORWARD);
ResultSet rs = null;
String sql = "select * from TABLE"; //long lasting select
boolean ok = stmt.execute(sql);
if (ok) {
rs = stmt.getResultSet();
while (rs.next()) {
//do something
}
}
if (rs != null) {
rs.close();
}
if (stmt != null) {
stmt.close();
}
I try to achieve same think as flamerobin client does -- on the fly load of the data to the table (only when we need them -- scroll down in the table). We develop application which is a client of two tier ERP system (DB server firebird, client on netbeans platform).We wrote some database components which fills JXTable with data on the same principle like "interbase Delphi components" did in the past. The code above is symplified, in the component code we load the first 100 rows to the JTable table model and when user scrolls down we load another 100 rows etc. However I notice that the load time of the first 100 rows is the same as if we load all rows to the dataset. That is the code
boolean ok = stmt.execute(sql);
if (ok) {
rs = stmt.getResultSet();
int rows = 0;
while (rs.next() rows < 100) {
//do something
rows++;
}
}
took practicaly same time as the first chunk of source code. It seems, that the stmt.execute(sql) command waits until all select rows are returned from server. However I set that I want to get 50 rows chunks, so I supposed that the while cycle will start immediately after getting first 50 rows from the DB server. So I want start the while cycle after fetching first 50 rows (as if I set stmt.setMaxRows(50)), but I want to have oportunity to let the result set open and fetch another rows on demand.
Using setFetchSize normally does exactly what you expect: it will fetch rows in batches of the specified size (Firebird 3 can decide to return less when it considers the batch too big).
However as you have specified defaultHoldable in your connection string, the result set is HOLD_CURSORS_OVER_COMMIT, and holdable result sets are fully cached client-side before they are returned.
You either need to set the holdability to CLOSE_CURSORS_AT_COMMIT for this specific statement, or - if you do this for all statements - just remove the defaultHoldable from your connection string.
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
I noticed weird behavior in my application. It looks like commited data is not visible right after commit. Algorithm looks like this :
connection1 - insert into table row with id = 5
connection1 - commit, close
connection2 - open
connection2 - select from table row with id = 5 (no results)
connection2 - insert into table row with id = 5 (PRIMARY KEY VIOLATION, result is in db)
If select on connection2 returns no results then i do insert, otherwise it is update.
Server has many databases (~200), it looks like commit is done but changes are in DB later. I use java and jdbc. Any ideas would be appreciated.
This behavior corresponds to the REPEATABLE READ isolation mode, see SET TRANSACTION:
REPEATABLE READ
All statements of the current transaction can only see rows committed before the
first query or data-modification statement
was executed in this transaction.
Try connection.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED) to see if it makes a difference.
I want some advice on some concurrency issues regarding jdbc, i basically need to update a value and then retrieve that value using a update then a select, I'm assuming by turning auto commit off no other transaction can access this table, hence other transactions won't be able to perform update and select queries until this has been committed.
Below is some example code. Do you think this will work and does any one else have a better solution to implementing this?
int newVal=-1;
con.setAutoCommit(false);
PreparedStatement statement = con.prepareStatement("UPDATE atable SET val=val+1 WHERE id=?");
statement.setInt(1, id);
int result = statement.executeUpdate();
if (result != 1) {
throw new SQLException("Nothing updated");
} else {
statement = con.prepareStatement("SELECT val FROM atable WHERE id=?");
statement.setInt(1, id);
ResultSet resultSet = statement.executeQuery();
if (resultSet.next()) {
newVal = resultSet.getInt("val");
}
}
statement.close();
con.commit();
con.setAutoCommit(true);
Thanks.
Assuming you use some form of data source, you may configure there if you want transactionality and the isolation level. But to be explicit:
try(Connection con = ds.getConnection()){
con.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
con.setAutoCommit(false);
//...
} catch(SQLException sqle) {
throw new MyModelException(e)
}
Now, you could trigger pesimistic locking by updating a version (or timestamp) field in your table. This will trigger a lock in the database (most likely at the record level):
try(PreparedStatement pStm = con.prepareStatement("update atable set version=version+1")){
pStm.executeUpdate();
}
At this point, if another user is trying to update the same record simultaneously, this connection will either wait or timeout, so you must be ready for both things. The record will not be unlocked until your transaction ends (commit or rollback).
Then, you can safely select and update whatever you want and be sure that nobody else is touching your record as you process your data. If anybody else tries they will be put on wait until you finish (or they will timeout depending on connection configuration).
Alternatively you could use optimistic locking. In this case you read your record, do modifications to it, but in the update you make sure nobody else has changed it since you read it by checking that the version/timestamp field is the same as the one you orginally read. In this case you must be prepared to retry a transaction (or abort it alltogether) if you realize you have stale/outdated data.
i.e. update atable set afield=? where id=? and version=1
If the number of rows affected is 0, then you know that is probable that the record was updated between your read and your update and the record is no longer in version 1.
Setting autocommit=false on your connection will not prevent other connections/threads from changing the row in the database! It will only disable automatic commits after each JDBC operation on that specific connection.
You will need to lock the row, eg. with select ... for update to prevent other transactions against the row, and also you will need to do your selects and updates within a single transaction.
Cheers,