String insert1 = "INSERT INTO Table1(Col1, col2, col3)"
+ "VALUES(?,?,?)";
String insert2 = "INSERT INTO Table2(Colx, coly)"
+ "VALUES(?,?)";
Connection conn = aConn;
PreparedStatement ps = null;
try {
ps = conn.prepareStatement(insert1);
// ps.addBatch(insert2);
I'm trying to Insert Data into multiple tables at a time, and it seems like addBatch(String sql) is not defined for PreparedStatement.
Is there any alternate way?
First of all, a PreparedStatement is used to cache a single SQL statement. This has the advantage that the driver/database might optimize the statement since it is expecting many of them and since it is a parameterized statement. If you want to use it for two different SQL statements you need two PreparedStatements.
In order to add rows to the statement you need to set your parameters using set*(1,...), set*(2,...), set*(3,...), etc. and then call addBatch() (no arguments!). Finally you submit the batch of statements using executeBatch().
Related
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
I'm writing a webpage that takes input from a form, sends it through cgi to a java file, inserts the input into a database through sql and then prints out the database. I'm having trouble inserting into the database using variables though, and I was wondering if anyone would be able to help me out.
String a1Insert = (String)form.get("a1");
String a2Insert = (String)form.get("a2");
This is where I get my variables form the form (just believe that it works, there's a bunch more back end but I've used this before and I know it's getting the variables correctly).
String dbURL = "jdbc:derby://blah.blahblah.ca:CSE2014;user=blah;password=blarg";
Connection conn = DriverManager.getConnection(dbURL);
Statement stmt = conn.createStatement();
stmt.executeUpdate("set schema course");
stmt.executeUpdate("INSERT INTO MEMBER VALUES (a1Insert, a2Insert)");
stmt.close();
This is where I try to insert into the databse. It give me the error:
Column 'A1INSERT' is either not in any table in the FROM list or appears within a join specification and is outside the scope of the join specification or appears in a HAVING clause and is not in the GROUP BY list. If this is a CREATE or ALTER TABLE statement then 'A1INSERT' is not a column in the target table.
If anyone has any ideas that would be lovely ^.^ Thanks
java.sql.Statement doesn't support parameters, switching to java.sql.PreparedStatement will allow you to set parameters. Replace the parameter names in your SQL with ?, and call the setter methods on the prepared statement to assign a value to each parameter. This will look something like
String sql = "INSERT INTO MEMBER VALUES (?, ?)";
PreparedStatement stmt = con.prepareStatement(sql);
stmt.setString(1, "a1");
stmt.setString(2, "a2");
stmt.executeUpdate();
That will execute the SQL
INSERT INTO MEMBER VALUES ('a1', 'a2')
Notice the parameter indexes start from 1, not 0. Also notice I didn't have to put quotes on the strings, the PreparedStatement did it for me.
Alternatively you could keep using Statement and create your SQL string in Java code, but that introduces the possibility of SQL injection attacks. Using PreparedStatement to set parameters avoids that issue by taking care of handling quotes for you; if it finds a quote in the parameter value it will escape it, so that it will not affect the SQL statement it is included in.
Oracle has a tutorial here.
I am using MySQL Database. The following piece creates a record and gets the id from the created record:
insertStmt = connection
.prepareStatement("INSERT INTO bugs (summary, status, report_date) VALUES (?, ?, ? )");
//...
insertStmt.executeUpdate();
idQuery = connection.prepareStatement("SELECT LAST_INSERT_ID()");
rs = idQuery.executeQuery();
if (rs != null) {
rs.next();
return new Long(rs.getLong(1)).toString();
}
Now, if two threads execute this and their execution is interleaved, say, the first thread inserts the record followed by the insertion by the second thread, after which the first thread calls last_insert_id() which will be incorrect for this thread as the second thread has already inserted a record.
This might be overcome using synchronization, however. Is there a way we can execute the two statements in a single database call?
LAST_INSERT_ID works per-connection, and as your question states you can have a race condition if two statements in two threads use the same connection.
You have two ways around this:
1: Use a separate connection per thread (not easy, but this is really the best option for scaling and sense; use connection pooling)
2: Use the form of executeUpdate that records the auto-generated key in the same API call, allowing you to read it back later using getGeneratedKeys so that you don't have to use LAST_INSERT_ID in a second query, so avoiding the race condition. There's a similar form of prepareStatement that you can use in prepared statements.
Option 2 is probably what you want in the short term. The link in option 2 goes straight to that API. This link is a MySQL article outlining how to use it.
According to https://dev.mysql.com/doc/refman/5.7/en/connector-j-reference-configuration-properties.html, you should be able to add ?allowMultiQueries=true to your JDBC connection string. Then you would be able to pass multiple statements, separated by semicolons, in Statement#execute(String sql) calls.
Edit: or, use a stored procedure that does what you want. Or, as you said, synchronize the Java code.
You can try using a Multiquery, combined the Insert and the Select Last_INSERT_ID() in the same string.
1) prepare the connection for using the multiquery:
"jdbc:mysql://"+host+"/"+database+"?allowMultiQueries=true"
2) Combine The Insert Query with the Select:
multiQuerySqlString="INSERT INTO bugs (summary, status, report_date) VALUES (1, 2, 3 ); SELECT LAST_INSERT_ID()"
3) esecute the query and expecting multiple result sets:
boolean isResultSet = statement.execute();
ResultSet res = statement.getResultSet();
if isResultSet = statement.getMoreResults();
// Second ReulstSet object
res = cs.getResultSet();
I hope it works
If you have to do this all on a single connection you can ask the driver to return the generated ID:
insertStmt = connection.prepareStatement("...",PreparedStatement.RETURN_GENERATED_KEYS );
insertStmt.executeUpdate();
ResultSet rs = insertStatement.getGeneratedKeys();
Long id = null;
if (rs != null)
{
rs.next();
id = rs.getLong(1);
}
connection.commit();
return id;
Depending on the driver you might need a different prepareStatement() call that takes the column names as the second parameter:
insertStmt = connection.prepareStatement("INSERT ", new String[] {"ID"});
But even in with the above code you should be doing the concurrent inserts on different physical connections to be able to properly control your transactions.
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
I want to batch up multiple types of database calls in one PreparedStatement. Is this possible?
Is there anyway to do something like
PreparedStatement pstmt = connection.prepareStatement("?");
where the ? can either be INSERT INTO MY_TABLE VALUES(1,2,3,4) or it could be UPDATE MY_TABLE, SET MY_VAL='1' WHERE MY_VAL IS NULL
Or do I always need to specify a table and action for my prepared statement?
Java will not allow you add only ? in preparedstatement string parameter, as it expects the ? for the place holder only for the parameters to the give SQL.
For your case, you may have to have 2 prepared statement objects, and in loop through, you can make a decision which one to call. So it would be something like below:
PreparedStatement insertPstmt = connection.prepareStatement("INSERT INTO MY_TABLE VALUES(?,?,?,?)");
PreparedStatement updatePstmt = connection.prepareStatement("UPDATE MY_TABLE, SET MY_VAL=? WHERE MY_VAL IS NULL");
While (<condition>) {
If (<insert condition>) {
// use insert pstmt and add batch
} else {
// use update pstmt and add batch
}
}
insertPstmt.executeBatch();
updatePstmt.executeBatch();
if you have any insert , which has dependency on the update, you might execute the batches accordingly. This will make sure that the update will work correctly. I would think of executing insert first, as they might not depend on update.
On a PreparedStatement, after binding the variables for the first execution, call
pstmt.addBatch();
then bind the variables for the next, and each time calling addBatch().
Then, when you're done adding batches you execute the bacth by alling
pstmt.executeBatch();
See :
http://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html#addBatch%28%29
and
http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeBatch%28%29
BTW : injecting the entire statment into a variable won't work. This batch mechanism exists to reuse the same statement binding different variables each time.
Insert and Update commands don't return any data that has to be processed. If you want to do just things like in your examples, you can simply run a non executing query command and provide a concatenated string of all your sql strings separated by a semicolon.
"INSERT INTO MY_TABLE VALUES(1,2,3,4)" + ";" +"UPDATE MY_TABLE, SET MY_VAL='1' WHERE MY_VAL IS NULL" + ";" +...
You don't need to prepare the statement in that case and also wouldn't receive any performance gain by doing so.