Batch insert using jdbcTemplate.batchUpdate confusion

Batch insert using jdbcTemplate.batchUpdate confusion - java

Does jdbcTemplate.batchUpdate execute multiple single insert statements OR 1 multi value list insert on the database server?
I know that it sends the complete query payload at once to the server but am not sure how the execution takes place.
Can someone please explain/help?

From question:
Does jdbcTemplate.batchUpdate execute multiple single insert statements OR 1 multi value list insert on the database server?
From comment:
I was curious about int[] org.springframework.jdbc.core.JdbcTemplate.batchUpdate(String sql, List<Object[]> batchArgs, int[] argTypes)
TL;DR: It executes 1 multi-valued list.
Spring Framework is open-source, so it's easy to look at the source code and see that is actually does.
batchUpdate(String sql, List<Object[]> batchArgs, final int[] argTypes)
#Override
public int[] batchUpdate(String sql, List<Object[]> batchArgs, final int[] argTypes) throws DataAccessException {
if (batchArgs.isEmpty()) {
return new int[0];
}
return batchUpdate(
sql,
new BatchPreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
Object[] values = batchArgs.get(i);
int colIndex = 0;
for (Object value : values) {
colIndex++;
if (value instanceof SqlParameterValue) {
SqlParameterValue paramValue = (SqlParameterValue) value;
StatementCreatorUtils.setParameterValue(ps, colIndex, paramValue, paramValue.getValue());
}
else {
int colType;
if (argTypes.length < colIndex) {
colType = SqlTypeValue.TYPE_UNKNOWN;
}
else {
colType = argTypes[colIndex - 1];
}
StatementCreatorUtils.setParameterValue(ps, colIndex, colType, value);
}
}
}
#Override
public int getBatchSize() {
return batchArgs.size();
}
});
}
As can be seen, it calls the following method.
batchUpdate(String sql, final BatchPreparedStatementSetter pss)
#Override
public int[] batchUpdate(String sql, final BatchPreparedStatementSetter pss) throws DataAccessException {
if (logger.isDebugEnabled()) {
logger.debug("Executing SQL batch update [" + sql + "]");
}
int[] result = execute(sql, (PreparedStatementCallback<int[]>) ps -> {
try {
int batchSize = pss.getBatchSize();
InterruptibleBatchPreparedStatementSetter ipss =
(pss instanceof InterruptibleBatchPreparedStatementSetter ?
(InterruptibleBatchPreparedStatementSetter) pss : null);
if (JdbcUtils.supportsBatchUpdates(ps.getConnection())) {
for (int i = 0; i < batchSize; i++) {
pss.setValues(ps, i);
if (ipss != null && ipss.isBatchExhausted(i)) {
break;
}
ps.addBatch();
}
return ps.executeBatch();
}
else {
List<Integer> rowsAffected = new ArrayList<>();
for (int i = 0; i < batchSize; i++) {
pss.setValues(ps, i);
if (ipss != null && ipss.isBatchExhausted(i)) {
break;
}
rowsAffected.add(ps.executeUpdate());
}
int[] rowsAffectedArray = new int[rowsAffected.size()];
for (int i = 0; i < rowsAffectedArray.length; i++) {
rowsAffectedArray[i] = rowsAffected.get(i);
}
return rowsAffectedArray;
}
}
finally {
if (pss instanceof ParameterDisposer) {
((ParameterDisposer) pss).cleanupParameters();
}
}
});
Assert.state(result != null, "No result array");
return result;
}
As can be seen, it creates a single PreparedStatement, enters a loop calling addBatch(), and finally calls executeBatch().
So, the short answer is: 1 multi-valued list.
The full answer is that it likely sends one SQL statement and a multi-valued list to the database server, however it is entirely up to the JDBC driver how it actually implements batching, mostly limited by what the communication protocol supports, so the only way to know for sure is to trace the communication with the server.

Related

jdbctemplate query cursor not before the first row

I have the following code. It simply executes a query then import to the other database, the second part is irrelevant in my question/problem.
As I see by default jdbcTemplate.query() set Cursor before the first element. I don't know where is my mistake, but this code skips the first row since the cursor initial state #first element. So when the while (resultSet.next()) { starts its skip the first element....
I tried to force to move the cursor
resultSet.beforeFirst();
but it throws error.
Currently I check the Cursor position before the loop and I also thinking about to replace with a do {} while().
jdbcTemplate.query(sqlPull, resultSet -> {
List<List<Object>> batch = new ArrayList<>();
ResultSetMetaData metaData = resultSet.getMetaData();
int batchCount = 1;
int columnCount = metaData.getColumnCount();
if (resultSet.getRow() == 1) {
addAllColumnsToArray(resultSet, batch, columnCount);
}
while (resultSet.next()) {
addAllColumnsToArray(resultSet, batch, columnCount);
if (batch.size() >= 10000) {
callRestAPI(batch);
VaadinSqlApplication.logger.info("Commit batch: " + batchCount++);
insertTableToDb(sqlPush, push, batch);
batch.clear();
}
}
callRestAPI(batch);
VaadinSqlApplication.logger.info("Commit final batch. " + batchCount);
insertTableToDb(sqlPush, push, batch);
batch.clear();
});
....
private void addAllColumnsToArray(ResultSet resultSet, List<List<Object>> batch, int columnCount)
throws SQLException {
List<Object> row = new ArrayList<>();
for (int i = 1; i <= columnCount; i++) {
row.add(resultSet.getObject(i));
}
batch.add(row);
}

You can try do{...}while()
// ...
do{
addAllColumnsToArray(resultSet, batch, columnCount);
if (batch.size() >= 10000) {
callRestAPI(batch);
VaadinSqlApplication.logger.info("Commit batch: " + batchCount++);
insertTableToDb(sqlPush, push, batch);
batch.clear();
}
}while(resultSet.next());
//...

Spring jdbcTemaplate how to send complete batch size to DB2 server in one shot?

While jdbcTemplate.batchUpdate(...) is running I can see DB row count is increased gradually(by running count(*) in the table), initially 2k then 3k and goes till 10k. 2k and 3k are not exact numbers sometimes I get 235 and then 4567.
I was expecting 10 k rows (batch size) to be committed in one shot. In my understanding, if initially, I get row count 0 then next row count should be 10k.
I don't want one by one insert for performance reason that's why used batch update feature and seems it also doesn't commit all in one shot.
I want to send data(10k rows) to DB server only once for my batch size. for this is there anything I should specify in the configuration?
Below is the way I am writing jdbcTemplate batch update batch size is 10k.
public void insertRows(...) {
...
jdbcTemplate.batchUpdate(query, new BatchPreparedStatementSetter(){
#Override public void
setValues(PreparedStatement ps, int i) throws SQLException {
...
}
#Override public int getBatchSize() {
if(data == null){
return 0;
}
return data.size();
}
});
}
Edit: Added #Transactional to isertRows method still I can see the same behavior.
using Transnational it commits after 10k rows, but when I see count using with UR (select count(*) from mytable with ur) it shows data being updated gradually (2k 4k so on till 10k). That means data goes to a server in chunks (probably one bye one). How can i send everything in one shot. This question suggests it is achieved using rewriteBatchedStatements in mysql, is there anything similar we have in DB2 as well.
I am using DataSource implementation com.ibm.db2.jcc.DB2BaseDataSource

How about the below method? specify nUpdates =10,000 in your case. I have not tried testing this.please ignore my answer if it does not work.
// the batch size is set in the BatchPreparedStatementSetter, the number of rows we want to process is equal to the nbUpdates parameter
public int[] batchUpdate(String sql, final long nbUpdates, final BatchPreparedStatementSetter pss) throws DataAccessException {
if (logger.isDebugEnabled()) {
logger.debug("Executing SQL batch update [" + sql + "]");
}
return (int[]) execute(sql, new PreparedStatementCallback() {
public Object doInPreparedStatement(PreparedStatement ps) throws SQLException {
try {
int batchSize = pss.getBatchSize();
InterruptibleBatchPreparedStatementSetter ipss = (pss instanceof InterruptibleBatchPreparedStatementSetter ? (InterruptibleBatchPreparedStatementSetter) pss
: null);
if (JdbcUtils.supportsBatchUpdates(ps.getConnection())) {
List<Integer> rowsAffected = new ArrayList<Integer>();
for (int i = 1; i <= nbUpdates; i++) {
pss.setValues(ps, i - 1);
if (ipss != null && ipss.isBatchExhausted(i - 1)) {
if (logger.isDebugEnabled()) {
int batchIdx = (i % batchSize == 0) ? i / batchSize : (i / batchSize) + 1;
logger.debug("Batch exhausted - Sending last SQL batch update #" + batchIdx);
}
int[] res = ps.executeBatch();
for (int j = 0; j < res.length; j++) {
rowsAffected.add(res[j]);
}
break;
}
ps.addBatch();
if (i % batchSize == 0 || i == nbUpdates) {
if (logger.isDebugEnabled()) {
int batchIdx = (i % batchSize == 0) ? i / batchSize : (i / batchSize) + 1;
logger.debug("Sending SQL batch update #" + batchIdx);
}
int[] res = ps.executeBatch();
for (int j = 0; j < res.length; j++) {
rowsAffected.add(res[j]);
}
}
}
int[] result = new int[rowsAffected.size()];
for (int i = 0; i < result.length; i++) {
result[i] = rowsAffected.get(i).intValue();
}
return result;
} else {
List<Integer> rowsAffected = new ArrayList<Integer>();
for (int i = 0; i < nbUpdates; i++) {
pss.setValues(ps, i);
if (ipss != null && ipss.isBatchExhausted(i)) {
break;
}
rowsAffected.add(ps.executeUpdate());
}
int[] rowsAffectedArray = new int[rowsAffected.size()];
for (int i = 0; i < rowsAffectedArray.length; i++) {
rowsAffectedArray[i] = rowsAffected.get(i);
}
return rowsAffectedArray;
}
} finally {
if (pss instanceof ParameterDisposer) {
((ParameterDisposer) pss).cleanupParameters();
}
}
}
});
}

Cassandra + Hector, force compaction in a test to check that empty rows get deleted

We want to test that if a column has TTL (time-to-live) property it eventually will be removed from cassandra entirely along with the empty row which contained it.
As I understood, the algorithm for testing this behavious is
when saving an object, set TTL for a column
wait when TTL time passes, check that returned value is null
wait when GC_GRACE_SECONDS perion passes
check that the row also gets removed
And I failed to check the last item.
As I discovered (eg. here or here and in other places), I need to run compaction. Similar questions have been raised (eg. Hector (Cassandra) Delete Anomaly), but I didn't find anything that helped, and googling hasn't helped much.
So the question is, how I can force compaction from my integration test (using hector) to ensure that it behaves as expected? Or are there other ways to do this?
P.S. Truncating a column family is not an option.
Here are the details.
My tests:
private static final String KEYSPACE = "KEYSPACE";
private static final String COLUMN_FAMILY = "COLUMN_FAMILY";
private static final int GC_CRACE_SECONDS = 5;
// sut
private CassandraService cassandraService;
// dependencies
private Cluster cluster = HFactory.getOrCreateCluster("tstCltr",
"localhost:9160");
private Keyspace keyspace;
#BeforeClass
public static void setupBeforeClass() {
EmbeddedCassandraDaemon.getEmbeddedCassandraDaemon();
}
#Before
public void setUp() throws Exception {
keyspace = createKeyspace(KEYSPACE, cluster,
new QuorumAllConsistencyLevelPolicy());
cassandraService = new CassandraService(cluster, KEYSPACE,
COLUMN_FAMILY, GC_CRACE_SECONDS);
}
#Test
public void rowGetsRemovedAfterGCGraceSeconds() throws Exception {
Object obj = "OBJECT";
String rowKey = "key";
String columnName = "columnName";
logger.info("before persisting rows count is {}" + countRows());
cassandraService.persistObjectWithTtl(rowKey, columnName, obj, 5);
logger.info("after persisting rows count is {}" + countRows());
Object value = retrieve(rowKey, columnName);
assertNotNull(value);
logger.info("before TTL passes rows count is {}" + countRows());
TimeUnit.SECONDS.sleep(6);
Object nullValue = retrieve(rowKey, columnName);
assertNull(nullValue);
logger.info("after TTL passes rows count is {}" + countRows());
TimeUnit.SECONDS.sleep(10);
logger.info("wait 10 more seconds... rows count is {}" + countRows());
System.out.println("================================" + countRows());
TimeUnit.SECONDS.sleep(120);
int countRows = countRows();
logger.info("wait 2 more minutes... rows count is {}" + countRows);
assertEquals(0, countRows);
}
Code for persisting:
public void persistObjectWithTtl(Object rowKey, Object columnName,
Object obj, int ttl) {
LOGGER.debug("Persist {} / {}", rowKey, columnName);
HColumn<Object, Object> column = createColumn(columnName, obj,
SERIALIZER, SERIALIZER);
column.setTtl(ttl);
executeInsertion(rowKey, column);
}
private void executeInsertion(Object rowKey, HColumn<Object, Object> column) {
Mutator<Object> mutator = createMutator(keyspace, SERIALIZER);
mutator.addInsertion(rowKey, this.columnFamilyName, column);
mutator.execute();
}
Setting GcGraceSeconds for a column family:
private void addColumnFamily(String keySpaceName, String columnFamilyName,
int gcGraceSeconds) {
ColumnFamilyDefinition columnFamilyDefinition =
createColumnFamilyDefinition(keySpaceName, columnFamilyName);
ThriftCfDef columnFamilyWithGCGraceSeconds =
new ThriftCfDef(columnFamilyDefinition);
columnFamilyWithGCGraceSeconds.setGcGraceSeconds(gcGraceSeconds);
cluster.addColumnFamily(columnFamilyWithGCGraceSeconds);
}
And the code for counting rows, found on SO:
public int countRows() {
int rowCount = 100;
ObjectSerializer serializer = ObjectSerializer.get();
RangeSlicesQuery<Object, Object, Object> rangeSlicesQuery =
HFactory.createRangeSlicesQuery(keyspace, serializer,
serializer, serializer)
.setColumnFamily(COLUMN_FAMILY)
.setRange(null, null, false, 10)
.setRowCount(rowCount);
Object lastKey = null;
int i = 0;
while (true) {
rangeSlicesQuery.setKeys(lastKey, null);
QueryResult<OrderedRows<Object, Object, Object>> result =
rangeSlicesQuery.execute();
OrderedRows<Object, Object, Object> rows = result.get();
Iterator<Row<Object, Object, Object>> rowsIterator = rows.iterator();
if (lastKey != null && rowsIterator != null) {
rowsIterator.next();
}
while (rowsIterator.hasNext()) {
Row<Object, Object, Object> row = rowsIterator.next();
lastKey = row.getKey();
i++;
if (row.getColumnSlice().getColumns().isEmpty()) {
continue;
}
}
if (rows.getCount() < rowCount) {
break;
}
}
return i;
}
Thanks.
Update:
The reason was that the amount of data was not enoght for compaction to run, so I needed to put more data, and flush tables more frequently to disk. So I ended up with the following test case:
#Test
public void rowGetsRemovedAfterGCGraceSeconds() throws Exception {
final int expectedAmount = 50000;
logger.info("before persisting rows count is {}", countRows());
for (int i = 0; i < expectedAmount; i++) {
String rowKey = RandomStringUtils.randomAlphanumeric(128);
Object obj = RandomStringUtils.randomAlphanumeric(1000);
cassandraService.persistObjectWithTtl(rowKey, COLUMN_NAME, obj, 20);
if (i % 100 == 0) {
StorageService.instance.forceTableFlush(KEYSPACE, COLUMN_FAMILY);
}
}
logger.info("causing major compaction...");
StorageService.instance.forceTableCompaction(KEYSPACE, COLUMN_FAMILY);
logger.info("after major compaction rows count is {}", countRows());
waitAtMost(Duration.TWO_MINUTES)
.pollDelay(Duration.TWO_SECONDS)
.pollInterval(Duration.ONE_HUNDRED_MILLISECONDS)
.until(new Callable<Boolean>() {
#Override
public Boolean call() throws Exception {
int countRows = countRows();
logger.info("the rows count is {}", countRows);
return countRows < expectedAmount;
}
});
}
full code : test class and sut

Since you're working with Java, you can easily force a compaction through JMX by using the forceTableCompaction(keyspace, columnFamily) method of the org.apache.cassandra.db.StorageService MBean.

How to read a List in batches

I have a function which reads a List of Notifications Object. Now I have to read the Notifications object in batches of 100(suppose) and update it.
public boolean addBSInfoToTemp(List<ParentNotification> pNotify)
throws DaoException, BatchUpdateException {
int cnt = 0;
final String query = "insert into Temp values ?,?,?,'Y','N',?,?,?,?";
while (!pNotify.isEmpty()) {
try {
pst = getConnection().prepareStatement(query);
for (ParentNotification pn : pNotify) {
cnt++;
pst.setInt(1, pn.getUserId());
pst.setString(2, pn.getEmail());
pst.setString(3, pn.getNotificationType());
Date createdDate = (Date) pn.getCreatedDate();
pst.setDate(4, createdDate);
pst.setString(5, pn.getCreatedBy());
Date icsesCreatedDate = (Date) pn.getIcsesCreatedDate();
pst.setDate(6, icsesCreatedDate);
pst.setString(7, pn.getMiscellaneous());
pst.addBatch();
if(cnt==batchCount){
break;
}
}
int[] batch = pst.executeBatch();
if (batch.length != 0) {
flag = true;
}
} catch (BatchUpdateException b) {
flag = false;
} catch (SQLException sqlx) {
flag = false;
} finally {
close(pst, null);
}
}
return flag;
}
What I am trying to do is read the List with batchCount = 100 then update and go back to 101 record and then update another 100 records untill the List is empty.

You have this:
while (!pNotify.isEmpty()) {
but I don't see that you ever remove objects from the list, so you have infinite loop. I would just make the outside loop to loop through all of the elements like this:
for (int i=0; i<=pNotify.size(); i++){
// and inside check if it is devisible by batch length
if(i % batchCount == 0){
break;
}
}

How to pass an array object from jdbc code

I have a stored procedure get_data(estargs set(char(1000) not null)) in the informix 11.5 database. I have to use this stored procedure in order to get a value from the database.
I tried using this way but it fails:
conn = dataSource.getConnection();
String [] arrayObj={"and code = 'Value1'","and lmt= 10000.000"};
CallableStatement test=conn.prepareCall("{call get_data(?)}");
test.setObject(1, arrayObj);
test.execute();
ResultSet testrs = test.getResultSet();
while (testrs.next()) {
int data = testrs.getInt(1);
System.out.println(data);
}
This is not working. What do you think I am doing wrong?

That's not possible. Replace
conn.prepareCall("{call get_data(?)}");
by
conn.prepareCall("{call get_data(?, ?)}");
and replace
test.setObject(1, arrayObj);
by
test.setObject(1, arrayObj[0]);
test.setObject(2, arrayObj[1]);
Related question:
How to set multiple values in IN clause?
Update: make it all more "dynamically", you'd like to generate and populate the placeholders yourself with help of the following two utility methods:
public static String preparePlaceHolders(int length) {
StringBuilder builder = new StringBuilder(length * 2 - 1);
for (int i = 0; i < length; i++) {
if (i > 0) builder.append(',');
builder.append('?');
}
return builder.toString();
}
public static void setValues(PreparedStatement preparedStatement, Object... values) throws SQLException {
for (int i = 0; i < values.length; i++) {
preparedStatement.setObject(i + 1, values[i]);
}
}
which can be used as follows:
private static final String SQL_CALL_GET_DATA = "{call get_data(%s)}";
// ...
String sql = String.format(SQL_CALL_GET_DATA, preparePlaceholders(arrayObj.length));
statement = connection.prepareCall(sql);
setValues(statement, arrayObj);
// ...

Have you tried using java.sql.Array?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Batch insert using jdbcTemplate.batchUpdate confusion - java

Does jdbcTemplate.batchUpdate execute multiple single insert statements OR 1 multi value list insert on the database server? I know that it sends the complete query payload at once to the server but am not sure how the execution takes place. Can someone please explain/help?

Related

jdbctemplate query cursor not before the first row

Spring jdbcTemaplate how to send complete batch size to DB2 server in one shot?

Cassandra + Hector, force compaction in a test to check that empty rows get deleted

How to read a List in batches

How to pass an array object from jdbc code

Categories

Resources