JDBC prepared statement, batch insert performance improvement [duplicate] - java

I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);

I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");

I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.

You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.

If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.

You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);

try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);

Related

Optimising MySQL INSERT with many VALUES (),(),();

I am trying to improve my Java app's performance and I'm focusing at this point on one end point which has to insert a large amount of data into mysql.
I'm using plain JDBC with the MariaDB Java client driver:
try (PreparedStatement stmt = connection.prepareStatement(
"INSERT INTO data (" +
"fId, valueDate, value, modifiedDate" +
") VALUES (?,?,?,?)") {
for (DataPoint dp : datapoints) {
stmt.setLong(1, fId);
stmt.setDate(2, new java.sql.Date(dp.getDate().getTime()));
stmt.setDouble(3, dp.getValue());
stmt.setDate(4, new java.sql.Date(modifiedDate.getTime()));
stmt.addBatch();
}
int[] results = statement.executeBatch();
}
From populating the new DB from dumped files, I know that max_allowed_packet is important and I've got that set to 536,870,912 bytes.
In https://dev.mysql.com/doc/refman/5.7/en/insert-optimization.html it states that:
If you are inserting many rows from the same client at the same time,
use INSERT statements with multiple VALUES lists to insert several
rows at a time. This is considerably faster (many times faster in some
cases) than using separate single-row INSERT statements. If you are
adding data to a nonempty table, you can tune the
bulk_insert_buffer_size variable to make data insertion even faster.
See Section 5.1.7, “Server System Variables”.
On my DBs, this is set to 8MB
I've also read about key_buffer_size (currently set to 16MB).
I'm concerned that these last 2 might not be enough. I can do some rough calculations on the JSON input to this algorithm because it looks someething like this:
[{"actualizationDate":null,"data":[{"date":"1999-12-31","value":0},
{"date":"2000-01-07","value":0},{"date":"2000-01-14","value":3144},
{"date":"2000-01-21","value":358},{"date":"2000-01-28","value":1049},
{"date":"2000-02-04","value":-231},{"date":"2000-02-11","value":-2367},
{"date":"2000-02-18","value":-2651},{"date":"2000-02-25","value":-
393},{"date":"2000-03-03","value":1725},{"date":"2000-03-10","value":-
896},{"date":"2000-03-17","value":2210},{"date":"2000-03-24","value":1782},
and it looks like the 8MB configured for bulk_insert_buffer_size could easily be exceeded, if not key_buffer_size as well.
But the MySQL docs only make mention of MyISAM engine tables, and I'm currently using InnoDB tables.
I can set up some tests but it would be good to know how this will break or degrade, if at all.
[EDIT] I have --rewriteBatchedStatements=true. In fact here's my connection string:
jdbc:p6spy:mysql://myhost.com:3306/mydb\
?verifyServerCertificate=true\
&useSSL=true\
&requireSSL=true\
&cachePrepStmts=true\
&cacheResultSetMetadata=true\
&cacheServerConfiguration=true\
&elideSetAutoCommits=true\
&maintainTimeStats=false\
&prepStmtCacheSize=250\
&prepStmtCacheSqlLimit=2048\
&rewriteBatchedStatements=true\
&useLocalSessionState=true\
&useLocalTransactionState=true\
&useServerPrepStmts=true
(from https://github.com/brettwooldridge/HikariCP/wiki/MySQL-Configuration )
An alternative is to execute the batch from time to time. This allows you to reduce the size of the batchs and let you focus on more important problems.
int batchSize = 0;
for (DataPoint dp : datapoints) {
stmt.setLong(1, fId);
stmt.setDate(2, new java.sql.Date(dp.getDate().getTime()));
stmt.setDouble(3, dp.getValue());
stmt.setDate(4, new java.sql.Date(modifiedDate.getTime()));
stmt.addBatch();
//When limit reach, execute and reset the counter
if(batchSize++ >= BATCH_LIMIT){
statement.executeBatch();
batchSize = 0;
}
}
// To execute the remaining items
if(batchSize > 0){
statement.executeBatch();
}
I generally use a constant or a parameter based on the DAO implementation to be more dynamic but a batch of 10_000 row is a good start.
private static final int BATCH_LIMIT = 10_000;
Note that this is not necessary to clear the batch after an execution. Even if this is not specified in Statement.executeBatch documentation, this is in the JDBC specification 4.3
14 Batch Updates
14.1 Description of Batch Updates
14.1.2 Successful Execution
Calling the method executeBatch closes the calling Statement object’s current result set if one is open.
The statement’s batch is reset to empty once executeBatch returns.
The management of the result is a bit more complicated but you can still concatenate the results if you need them. This can be analyzed at any time since the ResultSet is not needed anymore.

30 minutes to insert 100k records with JDBC Batch Update while inserting records [duplicate]

I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);

Inserting data into Multiple tables

String insert1 = "INSERT INTO Table1(Col1, col2, col3)"
+ "VALUES(?,?,?)";
String insert2 = "INSERT INTO Table2(Colx, coly)"
+ "VALUES(?,?)";
Connection conn = aConn;
PreparedStatement ps = null;
try {
ps = conn.prepareStatement(insert1);
// ps.addBatch(insert2);
I'm trying to Insert Data into multiple tables at a time, and it seems like addBatch(String sql) is not defined for PreparedStatement.
Is there any alternate way?
First of all, a PreparedStatement is used to cache a single SQL statement. This has the advantage that the driver/database might optimize the statement since it is expecting many of them and since it is a parameterized statement. If you want to use it for two different SQL statements you need two PreparedStatements.
In order to add rows to the statement you need to set your parameters using set*(1,...), set*(2,...), set*(3,...), etc. and then call addBatch() (no arguments!). Finally you submit the batch of statements using executeBatch().

java JDBC inserting many rows into tables

I have a csv file with 80 000 rows,
each rows have: cost;date (123.232;30/12/2008)
I have to insert all cost data into tables names as a date in parametr second for example:
123.232 cost will be row in a "30/12/2008" table
and i have so many rows like this..
Now my program looks like:
Now i have to declare SQL query in for loop because i need "date" parameter,
my question - how to draw a "pStatement = connection.prepareStatement("INSER...." line away from for loop ? ofcourse with getting "date" parameter
Why i need that?- because now addingBatch doesn't work and now add to database only last row.
If i will move pStatement.executeBatch() inside for loop- then that will not work like a batch addingbut like normally each adding.
I'm using batch adding because i need fast working my application.
All advice will be wanted
Database database = new Database();
Connection connection = database.GetConnection();
PreparedStatement pStatement = null;
for(int x=0; x<=allRowsInCSVFile.size()-1; x++){
Rows row = allRows.get(x);
pStatement = connection.prepareStatement("INSERT INTO \""+ row.getDate() +"\" (cost) VALUES (?);");
pStatement.setLong(1, row.getCost());
pStatement.addBatch();
}
pStatement.executeBatch();
connection.close();
I think you should split the allRowsInCSVFile to multiple lists each for one date then you can draw the prepared statement out of the loop (sort of). It will not be as you exactly want, but it will a batch for each date. I think that will be a compromise that you have to do.
not sure how good this is so ill post as community wiki
Object obj1 = new Object();
PreparedStatement pStatement =
connection.prepareStatement("insert into " + obj1.toString() );
while(true)
{
obj1.setSomeValue
}
You could use just jdbcTemplate.batchUpdate. See an example here: http://www.mkyong.com/spring/spring-jdbctemplate-batchupdate-example/
This allows you to use different sql but still execute in batch.
With preparedstatement you would have to group together all inserts with same date values and prepare different statements for each of those dates.

java - Multipile update statements in MySql

so I have a software which basically downloads 1.5K game server address from my MySQL db. It then pings all of them and then upload the information such as online players back to the database. The process looks like this:
Download server address
Ping the servers and get information
Upload information back to the database
So far I have been able to solve the part where it download the server host name and pings them but the problem arises when updating the servers.
To update I thought about using a for loop to construct one BIG string of many update statements and execute it at once but this is prone to sql injections. So idealy one would want to use prepared statements.
The SQL update statement i'm using is:
UPDATE serverlist SET `onlineplayers` = '3', maxplayers = '10',
name = 'A game server' WHERE `ip` = 'xxx.xxx.xxx.xxx' AND `port` = 1234;
So my question is: How can i execute all the 1.5K updates statements using parameterized queries?
If you google for "jdbc bulk update" you'll get lots of results like this one or this one.
The latter has an example like this:
try {
...
connection con.setAutoCommit(false);
PreparedStatement prepStmt = con.prepareStatement(
"UPDATE DEPT SET MGRNO=? WHERE DEPTNO=?");
prepStmt.setString(1,mgrnum1);
prepStmt.setString(2,deptnum1);
prepStmt.addBatch();
prepStmt.setString(1,mgrnum2);
prepStmt.setString(2,deptnum2);
prepStmt.addBatch();
int [] numUpdates=prepStmt.executeBatch();
for (int i=0; i < numUpdates.length; i++) {
if (numUpdates[i] == -2)
System.out.println("Execution " + i +
": unknown number of rows updated");
else
System.out.println("Execution " + i +
"successful: " numUpdates[i] + " rows updated");
}
con.commit();
} catch(BatchUpdateException b) {
// process BatchUpdateException
}
Sounds like you want to do a batch SQL update. Prepared statements are your friend. Here's an example of using prepared statements in batch:
http://www.mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/
Using prepared statements makes setting parameters easier and it allows the DB to efficiently perform multiple updates. Executing multiple SQL strings would work but would be inefficient since each SQL string would be sent to the DBMS, parsed, compiled, then executed. With prepared statements the SQL is parsed and compiled once then reused for future updates with different parameters.
Another important step that you should be aware about during MySQL batch update / insert is JDBC Connection propertie rewriteBatchedStatements=true ( false by default ). Without it batch mode is useless.
It cost me 1 day to "fix bug" till I found out this.
When you have small number of lines and close client-to-DB location ( 1ms ping ) , you even can't realize that you in "fake batch mode" , but when I switch environment to remote client ( ping=100ms ) and 100k lines to update , it would take 4hours of "batch mode update" with default rewriteBatchedStatements=false and just 2minutes with rewriteBatchedStatements=true
Create a prepared statement:
String sql = "update serverlist SET onlineplayers = ?, maxplayers = ?, name = ? where ip = ? and port = ?";
PreparedStatement stmt = connection.prepareStatement(sql);
Then loop through your list, and at each iteration, do
stmt.setInt(1, onlinePlayers);
stmt.setInt(2, maxPlayers);
stmt.setString(3, name);
stmt.setString(4, ip);
stmt.setInt(5, port);
stmt.executeUpdate();
For better performance, you could also use batch updates.
Read the JDBC tutorial.

Categories