How to clear a batch in JOOQ? - java

I am trying to reuse a prepared statement when executing multi-inserts. Something like
InsertValuesStepN<Record> batch = create.insertInto(table, fields);
for(int i=0; i<100000; i++) {
batch.values();
if(i % 1000 == 0) {
batch.execute();
// need to call clearBatch here so we don't insert records twice
}
}
but I don't see any way to have InsertValuesStepN clear it's records after calling execute. Is this possible?

Create a new statement for each batch:
You could create a new statement on each batch, instead of reusing the previous one.
InsertValuesStepN<Record> batch = null;
for(int i=0; i<100000; i++) {
if (batch == null)
batch = create.insertInto(table, fields);
batch.values();
if(i % 1000 == 0) {
batch.execute();
batch = null;
}
}
Using the bind() API
This is usually not recommended because of performance issue #6616, but since your bottleneck (as per your comment) is with creating a new prepared statement, you might try to use the Query.bind() API which you could use on your 2nd, 3rd, etc. batch to replace existing bind values with new ones in an existing query. Call Query.bind() like this:
// Create the initial statement with dummy values for your batch
Query batch = create.insertInto(table, fields).values(...).keepStatement(true);
for(int i=0; i<100000; i += 1000) {
// Call this once for each bind value
batch.bind(...);
batch.execute();
// Handle the last insertions, where you have less than 1000 rows per insert
// ...
}
Proxy JDBC
You could implement a proxy JDBC PreparedStatement that doesn't actually close the delegate statement when jOOQ calls PreparedStatement.close(), but keeps it open and offers the same statement again to jOOQ when jOOQ tries to prepare it again.
There's a pending feature request to offer such a PreparedStatement cache out of the box: https://github.com/jOOQ/jOOQ/issues/7327, or maybe, your JDBC driver already has one (e.g. Oracle does).
Using the import API
But perhaps, you're actually using for jOOQ's import API, which allows for specifying the batch, bulk, and commit sizes, e.g.
create
.loadInto(table)
.bulkAfter(1000)
.loadArrays(...) // There are other possible data sources
.fields(fields)
.execute();

Related

Hibernate is not responding while querying

How to handle if Hibernate queries take too long to return the result. I have already configured a query time out but on debugging it shows that the DB is responding by returning data, but hibernate fails to map the given data.
I do not want this scenario to happen in production, because my query might fail since the hibernate is not responding back.
I need a solution to come out from this scenario.
setProperty("javax.persistence.query.timeout", 180000);
JPAQuery query = queryFactory.select(....)
do{
List<Tuple> data = query.fetch().limit(5000);
//--------
} while(flag)
The above code works fine with data which are less in size, but for some data sets/ conditions the data is huge and eventually hibernate is not responding.
Try to follow these steps, if
Use Lazy Fetching instead of Eager Fetching like #ManyToMany(mappedBy="authors", fetch=FetchType.LAZY)
Or May be check if any of these Mistakes
You are using HibernateDaoSupport.getSession(), without ever returning them using releaseSession() (as described in the javadocs).
a) use HibernateDaoSupport.getHibernateTemplate() to cleanly create/destroy sessions
b) use getSession()/releaseSession() in a finally block
c) forget about HibernateDaoSupport, define transactions and use sessionFactory.getCurrentSession()
use, session.refresh(entity) or entityManager.refresh(entity) (if you use JPA) will give you fresh data from DB.
1) For setting the timeout in Hibernate query you can set hint "javax.persistence.query.timeout"
Code snippet ::
List<Test> test= em.createQuery("SELECT * FROM Test t")
.setHint("javax.persistence.query.timeout", 1)
.getResultList();
2) In case 2 columns are containing large data ,you can use CLOB and BLOB types for huge dataset.
Based on your last comment, you are looking for a way to manage a timeout for certains queries.
You can achieve this while creating your org.hibernate.Query with Hibernate:
Query queryObject = //initialize your query as you need;
queryObject.setTimeout(10); //that int represents the seconds of timeouts.
Hope this helps
Finally i couldn't find any direct way to get control over a hibernate query call.
The time out was not working for me since the Postgres already returned the result set but hibernate was taking time in mapping (correct me if i am wrong) due to data size.
Following piece of code saved me.
ExecutorService executor = Executors.newSingleThreadExecutor();
List<Future<List<Tuple>>> futureData = executor.invokeAll(Arrays.asList(new QueryService(params...)), 2, TimeUnit.MINUTES);
executor.shutdown();
for (Future future : futureData) {
try {
data = (List<Tuple>) future.get();
} catch (CancellationException e) {
if (EXPORT_LIMIT > 1000) {
EXPORT_LIMIT = 1000;
} else if (EXPORT_LIMIT > 500) {
EXPORT_LIMIT = 500;
} else if (EXPORT_LIMIT > 100) {
EXPORT_LIMIT = 100;
} else {
throw e;
}
isValid = false;
break;
}
}
So basically my default fetch limit of 5000 if not working, then i keep trying till 100.
If fetch size of 100 is also failing an exception will be thrown.
Thank you.

Spring jdbcTemplate vs PreparedStatement. Performance difference

I am using Oracle 11g. I have 3 tables(A,B,C) in my database A <one-many> B <many-one> C
I have a piece of code, that performs three inserts: firstly in A and C, after that in B. This piece of code is executed a lot of times(200000) and makes 200000 insert operations in each table.
I have two ways to make an insertion:
jdbc PreparedStatement:
DataSource ds = jdbcTemplate.getDataSource();
try (Connection connection = ds.getConnection();
PreparedStatement statement = connection.prepareStatement(sql1);
PreparedStatement statement2 = connection.prepareStatement(sql2);
PreparedStatement statement3 = connection.prepareStatement(sql3);) {
connection.setAutoCommit(false);
final int batchSize = 20;
int count = 0;
for (int i=1; i<= total; i++ ) {
// Define sql parameters
statement.setString(1, p1);
statement2.setString(1, p2);
statement2.setString(2, p3);
statement3.setInt(1, p4);
statement3.setString(2, p5);
statement.addBatch();
statement2.addBatch();
statement3.addBatch();
if (++count % batchSize == 0) {
statement.executeBatch();
statement.clearBatch();
statement2.executeBatch();
statement2.clearBatch();
statement3.executeBatch();
statement3.clearBatch();
connection.commit();
System.out.println(i);
}
}
statement.executeBatch();
statement.clearBatch();
statement2.executeBatch();
statement2.clearBatch();
statement3.executeBatch();
statement3.clearBatch();
connection.commit();
}
catch (SQLException e) {
e.printStackTrace();
}
}
Spring jdbcTemplate:
List<String> bulkLoadRegistrationSql = new ArrayList<String>(20);
for (int i=1; i<= total; i++ ) {
// 1. Define sql parameters p1,p2,p,3p4,p5
// 2. Prepare sql using parameters from 1
String sql1String = ...
String sql2String = ...
String sql3String = ...
bulkLoadRegistrationSql.add(sql1String);
bulkLoadRegistrationSql.add(sql2String);
bulkLoadRegistrationSql.add(sql3String);
if (i % 20 == 0) {
jdbcTemplate.batchUpdate(bulkLoadRegistrationSql
.toArray(new String[bulkLoadRegistrationSql.size()]));
//Clear inserted batch
bulkLoadRegistrationSql = new ArrayList<String>(20);
}
}
I measured execution time for total = 200000 and results are very confusing for me.
Spring jdbcTemplate is executed in 1480 seconds,
jdbc PreparedStatement in 200 seconds
I looked into jdbcTemplate source and found, that it uses Statement underneath, which should be less efficient than PreparedStatement. However the difference in results is too big and I am not sure if this happens just because of the difference between Statement and PreparedStatement. What are your ideas on that? Should the results theoretically be equaled if jdbcTemplate is replaced on namedParameterJdbcTemplate?
Yes it should be much closer, assuming the majority of the time was spent waiting for the responses from the database. Spring has its own overhead so you will have some more resource consumption on the client side.
In a prepared statement using placeholders, Oracle only parses the SQL once, and generates the plan once. It then caches the parse results, along with the plan for the SQL. In your JDBCTemplate example, each SQL statement looks different to the parser and will therefore require a full parse and plan generation by the server. Depending on your Oracle server's horsepower, this will result in an increased response time for each SQL statement. For 200,000 SQL statements, a net increase of 1280 seconds translates into an additional 6.4 milliseconds per call. That, to me, seems like a reasonable increase due to the additional parsing required.
I suggest adding some timing information to the database calls, so you can confirm that the SQL response time is lower in the improved version.
Spring JDBCTemplate also has methods that use prepared Statement. Please Refer this link Your test results make sense - you cannot compare execution using prepared statement and Ordinary SQL.
There are overloaded batchUpdate methods in Spring JDBCTemplate that uses Prepared statements for example the below function.
int[][] batchUpdate(String sql, final Collection batchArgs, final
int batchSize, final ParameterizedPreparedStatementSetter pss)
For any given SQL, 50 to 80 percent of a database engine's time is spent computing access paths.
When you use PreparedStatement directly, the database engine computes the access path once and returns a handle to the already computed access path (in the "prepare" phase). When the prepared statement is invoked, the database engine only has to apply the parameters to the already prepared access path and return the cursor.

Why Spring's jdbcTemplate.batchUpdate() so slow?

I'm trying to find the faster way to do batch insert.
I tried to insert several batches with jdbcTemplate.update(String sql), where
sql was builded by StringBuilder and looks like:
INSERT INTO TABLE(x, y, i) VALUES(1,2,3), (1,2,3), ... , (1,2,3)
Batch size was exactly 1000. I inserted nearly 100 batches.
I checked the time using StopWatch and found out insert time:
min[38ms], avg[50ms], max[190ms] per batch
I was glad but I wanted to make my code better.
After that, I tried to use jdbcTemplate.batchUpdate in way like:
jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
// ...
}
#Override
public int getBatchSize() {
return 1000;
}
});
where sql was look like
INSERT INTO TABLE(x, y, i) VALUES(1,2,3);
and I was disappointed! jdbcTemplate executed every single insert of 1000 lines batch in separated way. I loked at mysql_log and found there a thousand inserts.
I checked the time using StopWatch and found out insert time:
min[900ms], avg[1100ms], max[2000ms] per Batch
So, can anybody explain to me, why jdbcTemplate doing separated inserts in this method? Why method's name is batchUpdate?
Or may be I am using this method in wrong way?
These parameters in the JDBC connection URL can make a big difference in the speed of batched statements --- in my experience, they speed things up:
?useServerPrepStmts=false&rewriteBatchedStatements=true
See: JDBC batch insert performance
I found a major improvement setting the argTypes array in the call.
In my case, with Spring 4.1.4 and Oracle 12c, for insertion of 5000 rows with 35 fields:
jdbcTemplate.batchUpdate(insert, parameters); // Take 7 seconds
jdbcTemplate.batchUpdate(insert, parameters, argTypes); // Take 0.08 seconds!!!
The argTypes param is an int array where you set each field in this way:
int[] argTypes = new int[35];
argTypes[0] = Types.VARCHAR;
argTypes[1] = Types.VARCHAR;
argTypes[2] = Types.VARCHAR;
argTypes[3] = Types.DECIMAL;
argTypes[4] = Types.TIMESTAMP;
.....
I debugged org\springframework\jdbc\core\JdbcTemplate.java and found that most of the time was consumed trying to know the nature of each field, and this was made for each record.
Hope this helps !
I have also faced the same issue with Spring JDBC template. Probably with Spring Batch the statement was executed and committed on every insert or on chunks, that slowed things down.
I have replaced the jdbcTemplate.batchUpdate() code with original JDBC batch insertion code and found the Major performance improvement.
DataSource ds = jdbcTemplate.getDataSource();
Connection connection = ds.getConnection();
connection.setAutoCommit(false);
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;
for (Employee employee: employees) {
ps.setString(1, employee.getName());
ps.setString(2, employee.getCity());
ps.setString(3, employee.getPhone());
ps.addBatch();
++count;
if(count % batchSize == 0 || count == employees.size()) {
ps.executeBatch();
ps.clearBatch();
}
}
connection.commit();
ps.close();
Check this link as well
JDBC batch insert performance
Simply use transaction. Add #Transactional on method.
Be sure to declare the correct TX manager if using several datasources #Transactional("dsTxManager"). I have a case where inserting 60000 records. It takes about 15s. No other tweak:
#Transactional("myDataSourceTxManager")
public void save(...) {
...
jdbcTemplate.batchUpdate(query, new BatchPreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
...
}
#Override
public int getBatchSize() {
if(data == null){
return 0;
}
return data.size();
}
});
}
Change your sql insert to INSERT INTO TABLE(x, y, i) VALUES(1,2,3). The framework creates a loop for you.
For example:
public void insertBatch(final List<Customer> customers){
String sql = "INSERT INTO CUSTOMER " +
"(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";
getJdbcTemplate().batchUpdate(sql, new BatchPreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
Customer customer = customers.get(i);
ps.setLong(1, customer.getCustId());
ps.setString(2, customer.getName());
ps.setInt(3, customer.getAge() );
}
#Override
public int getBatchSize() {
return customers.size();
}
});
}
IF you have something like this. Spring will do something like:
for(int i = 0; i < getBatchSize(); i++){
execute the prepared statement with the parameters for the current iteration
}
The framework first creates PreparedStatement from the query (the sql variable) then the setValues method is called and the statement is executed. that is repeated as much times as you specify in the getBatchSize() method. So the right way to write the insert statement is with only one values clause.
You can take a look at http://docs.spring.io/spring/docs/3.0.x/reference/jdbc.html
I had also some bad time with Spring JDBC batch template. In my case, it would be, like, insane to use pure JDBC, so instead I used NamedParameterJdbcTemplate. This was a must have in my project. But it was way slow to insert hundreds os thousands of lines in the database.
To see what was going on, I've sampled it with VisualVM during the batch update and, voilà:
What was slowing the process was that, while setting the parameters, Spring JDBC was querying the database to know the metadata each parameter. And seemed to me that it was querying the database for each parameter for each line every time. So I just taught Spring to ignore the parameter types (as it is warned in the Spring documentation about batch operating a list of objects):
#Bean(name = "named-jdbc-tenant")
public synchronized NamedParameterJdbcTemplate getNamedJdbcTemplate(#Autowired TenantRoutingDataSource tenantDataSource) {
System.setProperty("spring.jdbc.getParameterType.ignore", "true");
return new NamedParameterJdbcTemplate(tenantDataSource);
}
Note: the system property must be set before creating the JDBC Template object. It would be possible to just set in the application.properties, but this solved and I've never after touched this again
I don't know if this will work for you, but here's a Spring-free way that I ended up using. It was significantly faster than the various Spring methods I tried. I even tried using the JDBC template batch update method the other answer describes, but even that was slower than I wanted. I'm not sure what the deal was and the Internets didn't have many answers either. I suspected it had to do with how commits were being handled.
This approach is just straight JDBC using the java.sql packages and PreparedStatement's batch interface. This was the fastest way that I could get 24M records into a MySQL DB.
I more or less just built up collections of "record" objects and then called the below code in a method that batch inserted all the records. The loop that built the collections was responsible for managing the batch size.
I was trying to insert 24M records into a MySQL DB and it was going ~200 records per second using Spring batch. When I switched to this method, it went up to ~2500 records per second. so my 24M record load went from a theoretical 1.5 days to about 2.5 hours.
First create a connection...
Connection conn = null;
try{
Class.forName("com.mysql.jdbc.Driver");
conn = DriverManager.getConnection(connectionUrl, username, password);
}catch(SQLException e){}catch(ClassNotFoundException e){}
Then create a prepared statement and load it with batches of values for insert, and then execute as a single batch insert...
PreparedStatement ps = null;
try{
conn.setAutoCommit(false);
ps = conn.prepareStatement(sql); // INSERT INTO TABLE(x, y, i) VALUES(1,2,3)
for(MyRecord record : records){
try{
ps.setString(1, record.getX());
ps.setString(2, record.getY());
ps.setString(3, record.getI());
ps.addBatch();
} catch (Exception e){
ps.clearParameters();
logger.warn("Skipping record...", e);
}
}
ps.executeBatch();
conn.commit();
} catch (SQLException e){
} finally {
if(null != ps){
try {ps.close();} catch (SQLException e){}
}
}
Obviously I've removed error handling and the query and Record object is notional and whatnot.
Edit:
Since your original question was comparing the insert into foobar values (?,?,?), (?,?,?)...(?,?,?) method to Spring batch, here's a more direct response to that:
It looks like your original method is likely the fastest way to do bulk data loads into MySQL without using something like the "LOAD DATA INFILE" approach. A quote from the MysQL docs (http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html):
If you are inserting many rows from the same client at the same time,
use INSERT statements with multiple VALUES lists to insert several
rows at a time. This is considerably faster (many times faster in some
cases) than using separate single-row INSERT statements.
You could modify the Spring JDBC Template batchUpdate method to do an insert with multiple VALUES specified per 'setValues' call, but you'd have to manually keep track of the index values as you iterate over the set of things being inserted. And you'd run into a nasty edge case at the end when the total number of things being inserted isn't a multiple of the number of VALUES lists you have in your prepared statement.
If you use the approach I outline, you could do the same thing (use a prepared statement with multiple VALUES lists) and then when you get to that edge case at the end, it's a little easier to deal with because you can build and execute one last statement with exactly the right number of VALUES lists. It's a bit hacky, but most optimized things are.
Solution given by #Rakesh worked for me.
Significant improvement in performance. Earlier time was 8 min, with this solution taking less than 2 min.
DataSource ds = jdbcTemplate.getDataSource();
Connection connection = ds.getConnection();
connection.setAutoCommit(false);
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;
for (Employee employee: employees) {
ps.setString(1, employee.getName());
ps.setString(2, employee.getCity());
ps.setString(3, employee.getPhone());
ps.addBatch();
++count;
if(count % batchSize == 0 || count == employees.size()) {
ps.executeBatch();
ps.clearBatch();
}
}
connection.commit();
ps.close();
Encountered some serious performance issue with JdbcBatchItemWriter.write() (link) from Spring Batch and find out the write logic delegates to JdbcTemplate.batchUpdate() eventually.
Adding a Java system properties of spring.jdbc.getParameterType.ignore=true fixed the performance issue entirely ( from 200 records per second to ~ 5000 ).
The patch was tested working on both Postgresql and MsSql (might not be dialect specific)
... and ironically, Spring documented this behaviour under a "note" section link
In such a scenario, with automatic setting of values on an underlying PreparedStatement, the corresponding JDBC type for each value needs to be derived from the given Java type. While this usually works well, there is a potential for issues (for example, with Map-contained null values). Spring, by default, calls ParameterMetaData.getParameterType in such a case, which can be expensive with your JDBC driver. You should use a recent driver version and consider setting the spring.jdbc.getParameterType.ignore property to true (as a JVM system property or in a spring.properties file in the root of your classpath) if you encounter a performance issue — for example, as reported on Oracle 12c (SPR-16139).
Alternatively, you might consider specifying the corresponding JDBC
types explicitly, either through a 'BatchPreparedStatementSetter' (as
shown earlier), through an explicit type array given to a
'List<Object[]>' based call, through 'registerSqlType' calls on a
custom 'MapSqlParameterSource' instance, or through a
'BeanPropertySqlParameterSource' that derives the SQL type from the
Java-declared property type even for a null value.

Avoiding JDBC call

The scenario is like this:
for loop // runs say 200000 times
{
// here, i do a select from a database, fetching few rows which are expected to increase with every new iteration of for loop
// currently i am doing this select using simple JDBC call (using JDBC only is NOT a requirement)
// then i do some string matching stuff and then i either insert or update a particular row (in 95% cases i will insert)
// this insert or update is being done using Hibernate (using Hibernate over here is a requirement)
}
So the problem is, in every loop, I have to consider the each and every previously inserted/updated row. Due to this requirement, I have to do a JDBC call in each and every loop. And this JDBC call is taking the maximum time, bringing down the performance.
I want to know, is there any method using which I do not have to make a JDBC call in each iteration, but still I will be able to consider all the records including the one in the just previous insert/update? Anything like caching or some in-memory data structure or something like that?
Here is the code:
for loop // runs say 2000 times
{
String query = pdi.selectAllPatients(patientInfo);
Statement st = conn.createStatement();
ResultSet patientRs = st.executeQuery(query);
while (patientRs.hasNext())
{
// some string ops
}
// Create session for DB No.2
Session sessionEmpi = sessionFactoryEmpi.getCurrentSession();
sessionEmpi.beginTransaction();
if(some condition)
patientDao.insertPatient(patientInfo, sessionEmpi);
else
patientDao.insertref(patientInfo.getref(), sessionEmpi);
conn.commit();
}
public int insertPatient(PatientInfo input, Session session) throws SQLException {
try {
session.save(input.getPatient());
session.flush();
session.save(input.getref());
session.getTransaction().commit();
return 1;
} catch (Exception ex) {
session.getTransaction().rollback();
ex.printStackTrace();
return 0;
}
}
Is the performance of the SELECT consistent? Unless your data is fairly small, you'll likely have trouble caching all your changes in memory. You may also be able to batch the SELECTs, effectively unrolling the loop.
You can use the PreparedStatement interface instead of Statement interface as it avoids the unnecessary calls for firing the query to the database you just have to bind the data in for loop this will help you to improve performance!!
example:
PreparedStatement s =con.prepareStatement("select * from student_master where stu_id = ?");
for()
{
s.setString(1,"s002");
ResultSet rs = s.executeQuery();
}

Creating SQL batch updates from within Java

I want to update every row on a specific column in a mySql database. Currently I am using a java.sql.PreparedStatement for each row and iterating in a for loop. I was wondering if there were any other alternatives in terms of Java programming to make this less time and resource consuming (something like executing the prepared statements in a batch). The updates are made from Java code because that is where I get the values from. I am also not interested in making stored procedures on the server as I do not have the rights for that.
Here is a link to an example that uses Java's prepared statement to execute a batch update. I also included the sample from the site for quick reference.
http://www.exampledepot.com/egs/java.sql/BatchUpdate.html
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO my_table VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
// Insert 10 rows of data
for (int i=0; i<10; i++) {
pstmt.setString(1, ""+i);
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
// All statements were successfully executed.
// updateCounts contains one element for each batched statement.
// updateCounts[i] contains the number of rows affected by that statement.
processUpdateCounts(updateCounts);
// Since there were no errors, commit
connection.commit();
} catch (BatchUpdateException e) {
// Not all of the statements were successfully executed
int[] updateCounts = e.getUpdateCounts();
// Some databases will continue to execute after one fails.
// If so, updateCounts.length will equal the number of batched statements.
// If not, updateCounts.length will equal the number of successfully executed statements
processUpdateCounts(updateCounts);
// Either commit the successfully executed statements or rollback the entire batch
connection.rollback();
} catch (SQLException e) {
}
public static void processUpdateCounts(int[] updateCounts) {
for (int i=0; i<updateCounts.length; i++) {
if (updateCounts[i] >= 0) {
// Successfully executed; the number represents number of affected rows
} else if (updateCounts[i] == Statement.SUCCESS_NO_INFO) {
// Successfully executed; number of affected rows not available
} else if (updateCounts[i] == Statement.EXECUTE_FAILED) {
// Failed to execute
}
}
}
If you're using MySQL, I believe the short answer to your question is "No". There's nothing you can do that will be any faster.
Indeed, even the prepared statement gains you nothing. Perhaps this is changed with newer versions, but last I checked (several years ago), MySQL just turns prepared statements into regular statements anyway. Nothing is cached.

Categories