Creating SQL batch updates from within Java - java

I want to update every row on a specific column in a mySql database. Currently I am using a java.sql.PreparedStatement for each row and iterating in a for loop. I was wondering if there were any other alternatives in terms of Java programming to make this less time and resource consuming (something like executing the prepared statements in a batch). The updates are made from Java code because that is where I get the values from. I am also not interested in making stored procedures on the server as I do not have the rights for that.

Here is a link to an example that uses Java's prepared statement to execute a batch update. I also included the sample from the site for quick reference.
http://www.exampledepot.com/egs/java.sql/BatchUpdate.html
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO my_table VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
// Insert 10 rows of data
for (int i=0; i<10; i++) {
pstmt.setString(1, ""+i);
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
// All statements were successfully executed.
// updateCounts contains one element for each batched statement.
// updateCounts[i] contains the number of rows affected by that statement.
processUpdateCounts(updateCounts);
// Since there were no errors, commit
connection.commit();
} catch (BatchUpdateException e) {
// Not all of the statements were successfully executed
int[] updateCounts = e.getUpdateCounts();
// Some databases will continue to execute after one fails.
// If so, updateCounts.length will equal the number of batched statements.
// If not, updateCounts.length will equal the number of successfully executed statements
processUpdateCounts(updateCounts);
// Either commit the successfully executed statements or rollback the entire batch
connection.rollback();
} catch (SQLException e) {
}
public static void processUpdateCounts(int[] updateCounts) {
for (int i=0; i<updateCounts.length; i++) {
if (updateCounts[i] >= 0) {
// Successfully executed; the number represents number of affected rows
} else if (updateCounts[i] == Statement.SUCCESS_NO_INFO) {
// Successfully executed; number of affected rows not available
} else if (updateCounts[i] == Statement.EXECUTE_FAILED) {
// Failed to execute
}
}
}

If you're using MySQL, I believe the short answer to your question is "No". There's nothing you can do that will be any faster.
Indeed, even the prepared statement gains you nothing. Perhaps this is changed with newer versions, but last I checked (several years ago), MySQL just turns prepared statements into regular statements anyway. Nothing is cached.

Related

How to clear a batch in JOOQ?

I am trying to reuse a prepared statement when executing multi-inserts. Something like
InsertValuesStepN<Record> batch = create.insertInto(table, fields);
for(int i=0; i<100000; i++) {
batch.values();
if(i % 1000 == 0) {
batch.execute();
// need to call clearBatch here so we don't insert records twice
}
}
but I don't see any way to have InsertValuesStepN clear it's records after calling execute. Is this possible?
Create a new statement for each batch:
You could create a new statement on each batch, instead of reusing the previous one.
InsertValuesStepN<Record> batch = null;
for(int i=0; i<100000; i++) {
if (batch == null)
batch = create.insertInto(table, fields);
batch.values();
if(i % 1000 == 0) {
batch.execute();
batch = null;
}
}
Using the bind() API
This is usually not recommended because of performance issue #6616, but since your bottleneck (as per your comment) is with creating a new prepared statement, you might try to use the Query.bind() API which you could use on your 2nd, 3rd, etc. batch to replace existing bind values with new ones in an existing query. Call Query.bind() like this:
// Create the initial statement with dummy values for your batch
Query batch = create.insertInto(table, fields).values(...).keepStatement(true);
for(int i=0; i<100000; i += 1000) {
// Call this once for each bind value
batch.bind(...);
batch.execute();
// Handle the last insertions, where you have less than 1000 rows per insert
// ...
}
Proxy JDBC
You could implement a proxy JDBC PreparedStatement that doesn't actually close the delegate statement when jOOQ calls PreparedStatement.close(), but keeps it open and offers the same statement again to jOOQ when jOOQ tries to prepare it again.
There's a pending feature request to offer such a PreparedStatement cache out of the box: https://github.com/jOOQ/jOOQ/issues/7327, or maybe, your JDBC driver already has one (e.g. Oracle does).
Using the import API
But perhaps, you're actually using for jOOQ's import API, which allows for specifying the batch, bulk, and commit sizes, e.g.
create
.loadInto(table)
.bulkAfter(1000)
.loadArrays(...) // There are other possible data sources
.fields(fields)
.execute();

Getting affected row count when there is no resultset object

I have given a pl/sql procedure that I am supposed to call from java code through jdbc.
public boolean deleteCompany(TimeStamp timestamp, Long companyId){
String str = "{call Delete_Company(?,?)}";
CallableStatement statement = null;
boolean deleted = false;
try{
statement = conn.prepareCall(str);
statement.setLong(1,companyId);
statement.setTimestamp(2,timeStamp);
statement.execute();
deleted = true;
return deleted;
}finally{
statement.close();
}
}
The problem is even if I send the wrong id number it obviously executes statement so varaible deleted becomes true. I tried .executeUpdate() method to be able to get affected row counts but it did not work properly cause when in both cases(when deletion/no deletion happened) it retrieved 1. I suppose the problem is the pl/sql procedure I am using just performs delete operation but not retrieve any result set object. So neither executeUpdate() nor getUpdateCount() methods does not help me. My question is there any way I can get affected row counts even if I have no result set object?
FYI: I understand that affected row count could be sent as a out parameter from pl/sql procedure, but I have no authority to make any change on the db side.
Since you can't change the stored procedure, one solution is to do the following
Get the number of rows before calling an update operation (select count(*) from your_tbl where .....)
Delete the records as you're already doing
Get the number of rows after the delete action and check if the number of affected rows is the same as in #1 (num_rows#1 - num_rows#3)
Other transactions can still make this approach somewhat unreliable because they can also change your table between steps #1 and #3.
If that's a concern for you, then you should use transactions (just place your Java code in a transaction).
Your statement.execute(); returns a boolean value and it always return true if the execution is success irrespective of what the procedure is doing on call. Further you can refer the below code for what you are looking.
...
boolean hadResults = cStmt.execute();
//
// Process all returned result sets
//
while (hadResults) {
ResultSet rs = cStmt.getResultSet();
// process result set
...
hadResults = cStmt.getMoreResults();
}
//
// Retrieve output parameters
//
// Connector/J supports both index-based and
// name-based retrieval
//
int outputValue = cStmt.getInt(2); // index-based
outputValue = cStmt.getInt("inOutParam"); // name-based
...

Spring jdbcTemplate vs PreparedStatement. Performance difference

I am using Oracle 11g. I have 3 tables(A,B,C) in my database A <one-many> B <many-one> C
I have a piece of code, that performs three inserts: firstly in A and C, after that in B. This piece of code is executed a lot of times(200000) and makes 200000 insert operations in each table.
I have two ways to make an insertion:
jdbc PreparedStatement:
DataSource ds = jdbcTemplate.getDataSource();
try (Connection connection = ds.getConnection();
PreparedStatement statement = connection.prepareStatement(sql1);
PreparedStatement statement2 = connection.prepareStatement(sql2);
PreparedStatement statement3 = connection.prepareStatement(sql3);) {
connection.setAutoCommit(false);
final int batchSize = 20;
int count = 0;
for (int i=1; i<= total; i++ ) {
// Define sql parameters
statement.setString(1, p1);
statement2.setString(1, p2);
statement2.setString(2, p3);
statement3.setInt(1, p4);
statement3.setString(2, p5);
statement.addBatch();
statement2.addBatch();
statement3.addBatch();
if (++count % batchSize == 0) {
statement.executeBatch();
statement.clearBatch();
statement2.executeBatch();
statement2.clearBatch();
statement3.executeBatch();
statement3.clearBatch();
connection.commit();
System.out.println(i);
}
}
statement.executeBatch();
statement.clearBatch();
statement2.executeBatch();
statement2.clearBatch();
statement3.executeBatch();
statement3.clearBatch();
connection.commit();
}
catch (SQLException e) {
e.printStackTrace();
}
}
Spring jdbcTemplate:
List<String> bulkLoadRegistrationSql = new ArrayList<String>(20);
for (int i=1; i<= total; i++ ) {
// 1. Define sql parameters p1,p2,p,3p4,p5
// 2. Prepare sql using parameters from 1
String sql1String = ...
String sql2String = ...
String sql3String = ...
bulkLoadRegistrationSql.add(sql1String);
bulkLoadRegistrationSql.add(sql2String);
bulkLoadRegistrationSql.add(sql3String);
if (i % 20 == 0) {
jdbcTemplate.batchUpdate(bulkLoadRegistrationSql
.toArray(new String[bulkLoadRegistrationSql.size()]));
//Clear inserted batch
bulkLoadRegistrationSql = new ArrayList<String>(20);
}
}
I measured execution time for total = 200000 and results are very confusing for me.
Spring jdbcTemplate is executed in 1480 seconds,
jdbc PreparedStatement in 200 seconds
I looked into jdbcTemplate source and found, that it uses Statement underneath, which should be less efficient than PreparedStatement. However the difference in results is too big and I am not sure if this happens just because of the difference between Statement and PreparedStatement. What are your ideas on that? Should the results theoretically be equaled if jdbcTemplate is replaced on namedParameterJdbcTemplate?
Yes it should be much closer, assuming the majority of the time was spent waiting for the responses from the database. Spring has its own overhead so you will have some more resource consumption on the client side.
In a prepared statement using placeholders, Oracle only parses the SQL once, and generates the plan once. It then caches the parse results, along with the plan for the SQL. In your JDBCTemplate example, each SQL statement looks different to the parser and will therefore require a full parse and plan generation by the server. Depending on your Oracle server's horsepower, this will result in an increased response time for each SQL statement. For 200,000 SQL statements, a net increase of 1280 seconds translates into an additional 6.4 milliseconds per call. That, to me, seems like a reasonable increase due to the additional parsing required.
I suggest adding some timing information to the database calls, so you can confirm that the SQL response time is lower in the improved version.
Spring JDBCTemplate also has methods that use prepared Statement. Please Refer this link Your test results make sense - you cannot compare execution using prepared statement and Ordinary SQL.
There are overloaded batchUpdate methods in Spring JDBCTemplate that uses Prepared statements for example the below function.
int[][] batchUpdate(String sql, final Collection batchArgs, final
int batchSize, final ParameterizedPreparedStatementSetter pss)
For any given SQL, 50 to 80 percent of a database engine's time is spent computing access paths.
When you use PreparedStatement directly, the database engine computes the access path once and returns a handle to the already computed access path (in the "prepare" phase). When the prepared statement is invoked, the database engine only has to apply the parameters to the already prepared access path and return the cursor.

UPDATE PreparedStatement not updating MSSQL using JDBC

I am trying to update a MSSQL instance using JDBC using a prepared statement, I made a method to update any record in the table when given the column name, the value to update, and the updated value.
public void updateProjectOptions(int projectID, int number, String column){
try {
PreparedStatement ps = conn.prepareStatement("UPDATE cryptic.dbo.projects SET ? = ? WHERE project_id = ?");
int newNum = number+1;
System.out.println(projectID+" "+newNum+" "+column);
ps.setString(1, column);
ps.setInt(2, newNum);
ps.setInt(3, projectID);
int debug = ps.executeUpdate();
System.out.println("Rows affected: "+debug);
} catch (SQLException ex) {
Logger.getLogger(DAL.class.getName()).log(Level.SEVERE, null, ex);
}
}
The first print statement is printing out the correct values so I know the inputs are correct, and the second print statement is letting me know that 1 row is affected which is correct.
If I run the script inside of Management Studio the script runs fine and updates the table, but if I run the script from the java project nothing is updated and no errors are generated.
The db table in question has 4 columns: (int)project_id, (nvarchar)project_name, (int)num_bugs, (int)num_features
Can anyone help me out with getting this to work and/or spot whats wrong?
You can't bind a column name that way, only variables.
I would recommend that you close that PreparedStatement in method scope in a finally block. Your way is asking for trouble.
I would also call writing to System.out a very bad idea. I'd prefer returning the number of affected rows to the user.
Column names cannot be parameterized in prepared statements. You can parameterize only literal values like strings or numbers.

Avoiding JDBC call

The scenario is like this:
for loop // runs say 200000 times
{
// here, i do a select from a database, fetching few rows which are expected to increase with every new iteration of for loop
// currently i am doing this select using simple JDBC call (using JDBC only is NOT a requirement)
// then i do some string matching stuff and then i either insert or update a particular row (in 95% cases i will insert)
// this insert or update is being done using Hibernate (using Hibernate over here is a requirement)
}
So the problem is, in every loop, I have to consider the each and every previously inserted/updated row. Due to this requirement, I have to do a JDBC call in each and every loop. And this JDBC call is taking the maximum time, bringing down the performance.
I want to know, is there any method using which I do not have to make a JDBC call in each iteration, but still I will be able to consider all the records including the one in the just previous insert/update? Anything like caching or some in-memory data structure or something like that?
Here is the code:
for loop // runs say 2000 times
{
String query = pdi.selectAllPatients(patientInfo);
Statement st = conn.createStatement();
ResultSet patientRs = st.executeQuery(query);
while (patientRs.hasNext())
{
// some string ops
}
// Create session for DB No.2
Session sessionEmpi = sessionFactoryEmpi.getCurrentSession();
sessionEmpi.beginTransaction();
if(some condition)
patientDao.insertPatient(patientInfo, sessionEmpi);
else
patientDao.insertref(patientInfo.getref(), sessionEmpi);
conn.commit();
}
public int insertPatient(PatientInfo input, Session session) throws SQLException {
try {
session.save(input.getPatient());
session.flush();
session.save(input.getref());
session.getTransaction().commit();
return 1;
} catch (Exception ex) {
session.getTransaction().rollback();
ex.printStackTrace();
return 0;
}
}
Is the performance of the SELECT consistent? Unless your data is fairly small, you'll likely have trouble caching all your changes in memory. You may also be able to batch the SELECTs, effectively unrolling the loop.
You can use the PreparedStatement interface instead of Statement interface as it avoids the unnecessary calls for firing the query to the database you just have to bind the data in for loop this will help you to improve performance!!
example:
PreparedStatement s =con.prepareStatement("select * from student_master where stu_id = ?");
for()
{
s.setString(1,"s002");
ResultSet rs = s.executeQuery();
}

Categories