I'm trying to find the faster way to do batch insert.
I tried to insert several batches with jdbcTemplate.update(String sql), where
sql was builded by StringBuilder and looks like:
INSERT INTO TABLE(x, y, i) VALUES(1,2,3), (1,2,3), ... , (1,2,3)
Batch size was exactly 1000. I inserted nearly 100 batches.
I checked the time using StopWatch and found out insert time:
min[38ms], avg[50ms], max[190ms] per batch
I was glad but I wanted to make my code better.
After that, I tried to use jdbcTemplate.batchUpdate in way like:
jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
// ...
}
#Override
public int getBatchSize() {
return 1000;
}
});
where sql was look like
INSERT INTO TABLE(x, y, i) VALUES(1,2,3);
and I was disappointed! jdbcTemplate executed every single insert of 1000 lines batch in separated way. I loked at mysql_log and found there a thousand inserts.
I checked the time using StopWatch and found out insert time:
min[900ms], avg[1100ms], max[2000ms] per Batch
So, can anybody explain to me, why jdbcTemplate doing separated inserts in this method? Why method's name is batchUpdate?
Or may be I am using this method in wrong way?
These parameters in the JDBC connection URL can make a big difference in the speed of batched statements --- in my experience, they speed things up:
?useServerPrepStmts=false&rewriteBatchedStatements=true
See: JDBC batch insert performance
I found a major improvement setting the argTypes array in the call.
In my case, with Spring 4.1.4 and Oracle 12c, for insertion of 5000 rows with 35 fields:
jdbcTemplate.batchUpdate(insert, parameters); // Take 7 seconds
jdbcTemplate.batchUpdate(insert, parameters, argTypes); // Take 0.08 seconds!!!
The argTypes param is an int array where you set each field in this way:
int[] argTypes = new int[35];
argTypes[0] = Types.VARCHAR;
argTypes[1] = Types.VARCHAR;
argTypes[2] = Types.VARCHAR;
argTypes[3] = Types.DECIMAL;
argTypes[4] = Types.TIMESTAMP;
.....
I debugged org\springframework\jdbc\core\JdbcTemplate.java and found that most of the time was consumed trying to know the nature of each field, and this was made for each record.
Hope this helps !
I have also faced the same issue with Spring JDBC template. Probably with Spring Batch the statement was executed and committed on every insert or on chunks, that slowed things down.
I have replaced the jdbcTemplate.batchUpdate() code with original JDBC batch insertion code and found the Major performance improvement.
DataSource ds = jdbcTemplate.getDataSource();
Connection connection = ds.getConnection();
connection.setAutoCommit(false);
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;
for (Employee employee: employees) {
ps.setString(1, employee.getName());
ps.setString(2, employee.getCity());
ps.setString(3, employee.getPhone());
ps.addBatch();
++count;
if(count % batchSize == 0 || count == employees.size()) {
ps.executeBatch();
ps.clearBatch();
}
}
connection.commit();
ps.close();
Check this link as well
JDBC batch insert performance
Simply use transaction. Add #Transactional on method.
Be sure to declare the correct TX manager if using several datasources #Transactional("dsTxManager"). I have a case where inserting 60000 records. It takes about 15s. No other tweak:
#Transactional("myDataSourceTxManager")
public void save(...) {
...
jdbcTemplate.batchUpdate(query, new BatchPreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
...
}
#Override
public int getBatchSize() {
if(data == null){
return 0;
}
return data.size();
}
});
}
Change your sql insert to INSERT INTO TABLE(x, y, i) VALUES(1,2,3). The framework creates a loop for you.
For example:
public void insertBatch(final List<Customer> customers){
String sql = "INSERT INTO CUSTOMER " +
"(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";
getJdbcTemplate().batchUpdate(sql, new BatchPreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
Customer customer = customers.get(i);
ps.setLong(1, customer.getCustId());
ps.setString(2, customer.getName());
ps.setInt(3, customer.getAge() );
}
#Override
public int getBatchSize() {
return customers.size();
}
});
}
IF you have something like this. Spring will do something like:
for(int i = 0; i < getBatchSize(); i++){
execute the prepared statement with the parameters for the current iteration
}
The framework first creates PreparedStatement from the query (the sql variable) then the setValues method is called and the statement is executed. that is repeated as much times as you specify in the getBatchSize() method. So the right way to write the insert statement is with only one values clause.
You can take a look at http://docs.spring.io/spring/docs/3.0.x/reference/jdbc.html
I had also some bad time with Spring JDBC batch template. In my case, it would be, like, insane to use pure JDBC, so instead I used NamedParameterJdbcTemplate. This was a must have in my project. But it was way slow to insert hundreds os thousands of lines in the database.
To see what was going on, I've sampled it with VisualVM during the batch update and, voilà:
What was slowing the process was that, while setting the parameters, Spring JDBC was querying the database to know the metadata each parameter. And seemed to me that it was querying the database for each parameter for each line every time. So I just taught Spring to ignore the parameter types (as it is warned in the Spring documentation about batch operating a list of objects):
#Bean(name = "named-jdbc-tenant")
public synchronized NamedParameterJdbcTemplate getNamedJdbcTemplate(#Autowired TenantRoutingDataSource tenantDataSource) {
System.setProperty("spring.jdbc.getParameterType.ignore", "true");
return new NamedParameterJdbcTemplate(tenantDataSource);
}
Note: the system property must be set before creating the JDBC Template object. It would be possible to just set in the application.properties, but this solved and I've never after touched this again
I don't know if this will work for you, but here's a Spring-free way that I ended up using. It was significantly faster than the various Spring methods I tried. I even tried using the JDBC template batch update method the other answer describes, but even that was slower than I wanted. I'm not sure what the deal was and the Internets didn't have many answers either. I suspected it had to do with how commits were being handled.
This approach is just straight JDBC using the java.sql packages and PreparedStatement's batch interface. This was the fastest way that I could get 24M records into a MySQL DB.
I more or less just built up collections of "record" objects and then called the below code in a method that batch inserted all the records. The loop that built the collections was responsible for managing the batch size.
I was trying to insert 24M records into a MySQL DB and it was going ~200 records per second using Spring batch. When I switched to this method, it went up to ~2500 records per second. so my 24M record load went from a theoretical 1.5 days to about 2.5 hours.
First create a connection...
Connection conn = null;
try{
Class.forName("com.mysql.jdbc.Driver");
conn = DriverManager.getConnection(connectionUrl, username, password);
}catch(SQLException e){}catch(ClassNotFoundException e){}
Then create a prepared statement and load it with batches of values for insert, and then execute as a single batch insert...
PreparedStatement ps = null;
try{
conn.setAutoCommit(false);
ps = conn.prepareStatement(sql); // INSERT INTO TABLE(x, y, i) VALUES(1,2,3)
for(MyRecord record : records){
try{
ps.setString(1, record.getX());
ps.setString(2, record.getY());
ps.setString(3, record.getI());
ps.addBatch();
} catch (Exception e){
ps.clearParameters();
logger.warn("Skipping record...", e);
}
}
ps.executeBatch();
conn.commit();
} catch (SQLException e){
} finally {
if(null != ps){
try {ps.close();} catch (SQLException e){}
}
}
Obviously I've removed error handling and the query and Record object is notional and whatnot.
Edit:
Since your original question was comparing the insert into foobar values (?,?,?), (?,?,?)...(?,?,?) method to Spring batch, here's a more direct response to that:
It looks like your original method is likely the fastest way to do bulk data loads into MySQL without using something like the "LOAD DATA INFILE" approach. A quote from the MysQL docs (http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html):
If you are inserting many rows from the same client at the same time,
use INSERT statements with multiple VALUES lists to insert several
rows at a time. This is considerably faster (many times faster in some
cases) than using separate single-row INSERT statements.
You could modify the Spring JDBC Template batchUpdate method to do an insert with multiple VALUES specified per 'setValues' call, but you'd have to manually keep track of the index values as you iterate over the set of things being inserted. And you'd run into a nasty edge case at the end when the total number of things being inserted isn't a multiple of the number of VALUES lists you have in your prepared statement.
If you use the approach I outline, you could do the same thing (use a prepared statement with multiple VALUES lists) and then when you get to that edge case at the end, it's a little easier to deal with because you can build and execute one last statement with exactly the right number of VALUES lists. It's a bit hacky, but most optimized things are.
Solution given by #Rakesh worked for me.
Significant improvement in performance. Earlier time was 8 min, with this solution taking less than 2 min.
DataSource ds = jdbcTemplate.getDataSource();
Connection connection = ds.getConnection();
connection.setAutoCommit(false);
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;
for (Employee employee: employees) {
ps.setString(1, employee.getName());
ps.setString(2, employee.getCity());
ps.setString(3, employee.getPhone());
ps.addBatch();
++count;
if(count % batchSize == 0 || count == employees.size()) {
ps.executeBatch();
ps.clearBatch();
}
}
connection.commit();
ps.close();
Encountered some serious performance issue with JdbcBatchItemWriter.write() (link) from Spring Batch and find out the write logic delegates to JdbcTemplate.batchUpdate() eventually.
Adding a Java system properties of spring.jdbc.getParameterType.ignore=true fixed the performance issue entirely ( from 200 records per second to ~ 5000 ).
The patch was tested working on both Postgresql and MsSql (might not be dialect specific)
... and ironically, Spring documented this behaviour under a "note" section link
In such a scenario, with automatic setting of values on an underlying PreparedStatement, the corresponding JDBC type for each value needs to be derived from the given Java type. While this usually works well, there is a potential for issues (for example, with Map-contained null values). Spring, by default, calls ParameterMetaData.getParameterType in such a case, which can be expensive with your JDBC driver. You should use a recent driver version and consider setting the spring.jdbc.getParameterType.ignore property to true (as a JVM system property or in a spring.properties file in the root of your classpath) if you encounter a performance issue — for example, as reported on Oracle 12c (SPR-16139).
Alternatively, you might consider specifying the corresponding JDBC
types explicitly, either through a 'BatchPreparedStatementSetter' (as
shown earlier), through an explicit type array given to a
'List<Object[]>' based call, through 'registerSqlType' calls on a
custom 'MapSqlParameterSource' instance, or through a
'BeanPropertySqlParameterSource' that derives the SQL type from the
Java-declared property type even for a null value.
Related
Need to insert records using spring jdbctemplate batch update. while inserting if duplicate record is found, the record needs to be updated else inserted. how do i do that?
Below is my code.
Note:have not included exception handling.
result = jdbcTemplate.batchUpdate(
"insert ignore xxx set yy = ?, zz = ? where aa = ?",
new BatchPreparedStatementSetter() {
public void setValues(PreparedStatement ps, int i) throws SQLException {
ps.setDouble(1, Double.parseDouble(new JSONObject(jsonArray.get(i).toString()).get("aa").toString()));
ps.setDouble(2, Double.parseDouble(new JSONObject(jsonArray.get(i).toString()).get("bb").toString()));
ps.setString(3, new JSONObject(jsonArray.get(i).toString()).get("cc").toString());
}
public int getBatchSize() {
return jsonArray.length();
}
} );
}
You could use one of the following choices:
as #fbokovikov said you can use merge sql command.
you can check if the record is exist or not. In case of performance you can first get all the keys from db and then generate correct insert and update queries using those keys. In this scenario you should be aware of poor performance if you have big data in your table.
you can delete records first and then insert them all. This is very bad in performance. :-)
Hope that helps
Need to insert records using spring jdbctemplate batch update. while
inserting if duplicate record is found, the record needs to be updated
else inserted
For this purpose you should use standard SQL-2003 command merge.
In MySQL this command has following syntax:
INSERT...ON DUPLICATE KEY UPDATE
I am using Oracle 11g. I have 3 tables(A,B,C) in my database A <one-many> B <many-one> C
I have a piece of code, that performs three inserts: firstly in A and C, after that in B. This piece of code is executed a lot of times(200000) and makes 200000 insert operations in each table.
I have two ways to make an insertion:
jdbc PreparedStatement:
DataSource ds = jdbcTemplate.getDataSource();
try (Connection connection = ds.getConnection();
PreparedStatement statement = connection.prepareStatement(sql1);
PreparedStatement statement2 = connection.prepareStatement(sql2);
PreparedStatement statement3 = connection.prepareStatement(sql3);) {
connection.setAutoCommit(false);
final int batchSize = 20;
int count = 0;
for (int i=1; i<= total; i++ ) {
// Define sql parameters
statement.setString(1, p1);
statement2.setString(1, p2);
statement2.setString(2, p3);
statement3.setInt(1, p4);
statement3.setString(2, p5);
statement.addBatch();
statement2.addBatch();
statement3.addBatch();
if (++count % batchSize == 0) {
statement.executeBatch();
statement.clearBatch();
statement2.executeBatch();
statement2.clearBatch();
statement3.executeBatch();
statement3.clearBatch();
connection.commit();
System.out.println(i);
}
}
statement.executeBatch();
statement.clearBatch();
statement2.executeBatch();
statement2.clearBatch();
statement3.executeBatch();
statement3.clearBatch();
connection.commit();
}
catch (SQLException e) {
e.printStackTrace();
}
}
Spring jdbcTemplate:
List<String> bulkLoadRegistrationSql = new ArrayList<String>(20);
for (int i=1; i<= total; i++ ) {
// 1. Define sql parameters p1,p2,p,3p4,p5
// 2. Prepare sql using parameters from 1
String sql1String = ...
String sql2String = ...
String sql3String = ...
bulkLoadRegistrationSql.add(sql1String);
bulkLoadRegistrationSql.add(sql2String);
bulkLoadRegistrationSql.add(sql3String);
if (i % 20 == 0) {
jdbcTemplate.batchUpdate(bulkLoadRegistrationSql
.toArray(new String[bulkLoadRegistrationSql.size()]));
//Clear inserted batch
bulkLoadRegistrationSql = new ArrayList<String>(20);
}
}
I measured execution time for total = 200000 and results are very confusing for me.
Spring jdbcTemplate is executed in 1480 seconds,
jdbc PreparedStatement in 200 seconds
I looked into jdbcTemplate source and found, that it uses Statement underneath, which should be less efficient than PreparedStatement. However the difference in results is too big and I am not sure if this happens just because of the difference between Statement and PreparedStatement. What are your ideas on that? Should the results theoretically be equaled if jdbcTemplate is replaced on namedParameterJdbcTemplate?
Yes it should be much closer, assuming the majority of the time was spent waiting for the responses from the database. Spring has its own overhead so you will have some more resource consumption on the client side.
In a prepared statement using placeholders, Oracle only parses the SQL once, and generates the plan once. It then caches the parse results, along with the plan for the SQL. In your JDBCTemplate example, each SQL statement looks different to the parser and will therefore require a full parse and plan generation by the server. Depending on your Oracle server's horsepower, this will result in an increased response time for each SQL statement. For 200,000 SQL statements, a net increase of 1280 seconds translates into an additional 6.4 milliseconds per call. That, to me, seems like a reasonable increase due to the additional parsing required.
I suggest adding some timing information to the database calls, so you can confirm that the SQL response time is lower in the improved version.
Spring JDBCTemplate also has methods that use prepared Statement. Please Refer this link Your test results make sense - you cannot compare execution using prepared statement and Ordinary SQL.
There are overloaded batchUpdate methods in Spring JDBCTemplate that uses Prepared statements for example the below function.
int[][] batchUpdate(String sql, final Collection batchArgs, final
int batchSize, final ParameterizedPreparedStatementSetter pss)
For any given SQL, 50 to 80 percent of a database engine's time is spent computing access paths.
When you use PreparedStatement directly, the database engine computes the access path once and returns a handle to the already computed access path (in the "prepare" phase). When the prepared statement is invoked, the database engine only has to apply the parameters to the already prepared access path and return the cursor.
I am trying to update a MSSQL instance using JDBC using a prepared statement, I made a method to update any record in the table when given the column name, the value to update, and the updated value.
public void updateProjectOptions(int projectID, int number, String column){
try {
PreparedStatement ps = conn.prepareStatement("UPDATE cryptic.dbo.projects SET ? = ? WHERE project_id = ?");
int newNum = number+1;
System.out.println(projectID+" "+newNum+" "+column);
ps.setString(1, column);
ps.setInt(2, newNum);
ps.setInt(3, projectID);
int debug = ps.executeUpdate();
System.out.println("Rows affected: "+debug);
} catch (SQLException ex) {
Logger.getLogger(DAL.class.getName()).log(Level.SEVERE, null, ex);
}
}
The first print statement is printing out the correct values so I know the inputs are correct, and the second print statement is letting me know that 1 row is affected which is correct.
If I run the script inside of Management Studio the script runs fine and updates the table, but if I run the script from the java project nothing is updated and no errors are generated.
The db table in question has 4 columns: (int)project_id, (nvarchar)project_name, (int)num_bugs, (int)num_features
Can anyone help me out with getting this to work and/or spot whats wrong?
You can't bind a column name that way, only variables.
I would recommend that you close that PreparedStatement in method scope in a finally block. Your way is asking for trouble.
I would also call writing to System.out a very bad idea. I'd prefer returning the number of affected rows to the user.
Column names cannot be parameterized in prepared statements. You can parameterize only literal values like strings or numbers.
The scenario is like this:
for loop // runs say 200000 times
{
// here, i do a select from a database, fetching few rows which are expected to increase with every new iteration of for loop
// currently i am doing this select using simple JDBC call (using JDBC only is NOT a requirement)
// then i do some string matching stuff and then i either insert or update a particular row (in 95% cases i will insert)
// this insert or update is being done using Hibernate (using Hibernate over here is a requirement)
}
So the problem is, in every loop, I have to consider the each and every previously inserted/updated row. Due to this requirement, I have to do a JDBC call in each and every loop. And this JDBC call is taking the maximum time, bringing down the performance.
I want to know, is there any method using which I do not have to make a JDBC call in each iteration, but still I will be able to consider all the records including the one in the just previous insert/update? Anything like caching or some in-memory data structure or something like that?
Here is the code:
for loop // runs say 2000 times
{
String query = pdi.selectAllPatients(patientInfo);
Statement st = conn.createStatement();
ResultSet patientRs = st.executeQuery(query);
while (patientRs.hasNext())
{
// some string ops
}
// Create session for DB No.2
Session sessionEmpi = sessionFactoryEmpi.getCurrentSession();
sessionEmpi.beginTransaction();
if(some condition)
patientDao.insertPatient(patientInfo, sessionEmpi);
else
patientDao.insertref(patientInfo.getref(), sessionEmpi);
conn.commit();
}
public int insertPatient(PatientInfo input, Session session) throws SQLException {
try {
session.save(input.getPatient());
session.flush();
session.save(input.getref());
session.getTransaction().commit();
return 1;
} catch (Exception ex) {
session.getTransaction().rollback();
ex.printStackTrace();
return 0;
}
}
Is the performance of the SELECT consistent? Unless your data is fairly small, you'll likely have trouble caching all your changes in memory. You may also be able to batch the SELECTs, effectively unrolling the loop.
You can use the PreparedStatement interface instead of Statement interface as it avoids the unnecessary calls for firing the query to the database you just have to bind the data in for loop this will help you to improve performance!!
example:
PreparedStatement s =con.prepareStatement("select * from student_master where stu_id = ?");
for()
{
s.setString(1,"s002");
ResultSet rs = s.executeQuery();
}
I'm using the JDBC template and want to read from a database using prepared statements. I iterate over many lines in a .csv file, and on every line I execute some SQL select queries with corresponding values.
I want to speed up my reading from the database but I don't know how to get the JDBC template to work with prepared statements.
There is the PreparedStatementCreator and the PreparedStatementSetter. As in this example both of them are created with anonymous inner classes.
But inside the PreparedStatementSetter class I don't have access to the values I want to set in the prepared statement.
Since I'm iterating through a .csv file, I can't hard code them as a String because I don't know them.
I also can't pass them to the PreparedStatementSetter because there are no arguments for the constructor. And setting my values to final would be dumb too.
I was used to the creation of prepared statements being fairly simple. Something like
PreparedStatement updateSales = con.prepareStatement(
"UPDATE COFFEES SET SALES = ? WHERE COF_NAME LIKE ? ");
updateSales.setInt(1, 75);
updateSales.setString(2, "Colombian");
updateSales.executeUpdate():
as in this Java tutorial.
By default, the JDBCTemplate does its own PreparedStatement internally, if you just use the .update(String sql, Object ... args) form. Spring, and your database, will manage the compiled query for you, so you don't have to worry about opening, closing, resource protection, etc. One of the saving graces of Spring. A link to Spring 2.5's documentation on this. Hope it makes things clearer. Also, statement caching can be done at the JDBC level, as in the case of at least some of Oracle's JDBC drivers.
That will go into a lot more detail than I can competently.
class Main {
public static void main(String args[]) throws Exception {
ApplicationContext ac = new
ClassPathXmlApplicationContext("context.xml", Main.class);
DataSource dataSource = (DataSource) ac.getBean("dataSource");
// DataSource mysqlDataSource = (DataSource) ac.getBean("mysqlDataSource");
JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);
String prasobhName =
jdbcTemplate.query(
"select first_name from customer where last_name like ?",
new PreparedStatementSetter() {
public void setValues(PreparedStatement preparedStatement) throws
SQLException {
preparedStatement.setString(1, "nair%");
}
},
new ResultSetExtractor<Long>() {
public Long extractData(ResultSet resultSet) throws SQLException,
DataAccessException {
if (resultSet.next()) {
return resultSet.getLong(1);
}
return null;
}
}
);
System.out.println(machaceksName);
}
}
Try the following:
PreparedStatementCreator creator = new PreparedStatementCreator() {
#Override
public PreparedStatement createPreparedStatement(Connection con) throws SQLException {
PreparedStatement updateSales = con.prepareStatement(
"UPDATE COFFEES SET SALES = ? WHERE COF_NAME LIKE ? ");
updateSales.setInt(1, 75);
updateSales.setString(2, "Colombian");
return updateSales;
}
};
I'd factor out the prepared statement handling to at least a method. In this case, because there are no results it is fairly simple (and assuming that the connection is an instance variable that doesn't change):
private PreparedStatement updateSales;
public void updateSales(int sales, String cof_name) throws SQLException {
if (updateSales == null) {
updateSales = con.prepareStatement(
"UPDATE COFFEES SET SALES = ? WHERE COF_NAME LIKE ?");
}
updateSales.setInt(1, sales);
updateSales.setString(2, cof_name);
updateSales.executeUpdate();
}
At that point, it is then just a matter of calling:
updateSales(75, "Colombian");
Which is pretty simple to integrate with other things, yes? And if you call the method many times, the update will only be constructed once and that will make things much faster. Well, assuming you don't do crazy things like doing each update in its own transaction...
Note that the types are fixed. This is because for any particular query/update, they should be fixed so as to allow the database to do its job efficiently. If you're just pulling arbitrary strings from a CSV file, pass them in as strings. There's also no locking; far better to keep individual connections to being used from a single thread instead.
I've tried a select statement now with a PreparedStatement, but it turned out that it was not faster than the Jdbc template. Maybe, as mezmo suggested, it automatically creates prepared statements.
Anyway, the reason for my sql SELECTs being so slow was another one. In the WHERE clause I always used the operator LIKE, when all I wanted to do was finding an exact match. As I've found out LIKE searches for a pattern and therefore is pretty slow.
I'm using the operator = now and it's much faster.