The scenario is simple.
I have a somehow large MySQL db containing two tables:
-- Table 1
id (primary key) | some other columns without constraints
-----------------+--------------------------------------
1 | foo
2 | bar
3 | foobar
... | ...
-- Table 2
id_src | id_trg | some other columns without constraints
-------+--------+---------------------------------------
1 | 2 | ...
1 | 3 | ...
2 | 1 | ...
2 | 3 | ...
2 | 5 | ...
...
On table1 only id is a primary key. This table contains about 12M entries.
On table2 id_src and id_trg are both primary keys and both have foreign key constraints on table1's id and they also have the option DELETE ON CASCADE enabled. This table contains about 110M entries.
Ok, now what I'm doing is only to create a list of ids that I want to remove from table 1 and then I'm executing a simple DELETE FROM table1 WHERE id IN (<the list of ids>);
The latter process is as you may have guessed would delete the corresponding id from table2 as well. So far so good, but the problem is that when I run this on a multi-threaded env and I get many Deadlocks!
A few notes:
There is no other process running at the same time nor will be (for the time being)
I want this to be fast! I have about 24 threads (if this does make any difference in the answer)
I have already tried almost all of transaction isolation levels (except the TRANSACTION_NONE) Java sql connection transaction isolation
Ordering/sorting the id's I think would not help!
I have already tried SELECT ... FOR UPDATE, but a simple DELETE would take up to 30secs! (so there is no use of using it) :
DELETE FROM table1
WHERE id IN (
SELECT id FROM (
SELECT * FROM table1
WHERE id='some_id'
FOR UPDATE) AS x);
How can I fix this?
I would appreciate any help and thanks in advance :)
Edit:
Using InnoDB engine
On a single thread this process would take a dozen hours even maybe a whole day, but I'm aiming for a few hours!
I'm already using a connection pool manager: java.util.concurrent
For explanation on double nested SELECTs please refer to MySQL can’t specify target table for update in FROM clause
The list that is to be deleted from DB, may contain a couple of million entries in total which is divided into chunks of 200
The FOR UPDATE clause is that I've heard that it locks a single row instead of locking the whole table
The app uses Spring's batchUpdate(String sqlQuery) method, thus the transactions are managed automatically
All ids have index enabled and the ids are unique 50 chars max!
DELETE ON CASCADE on id_src and id_trg (each separately) would mean that every delete on table1 id=x would lead to deletes on table2 id_src=x and id_trg=x
Some code as requested:
public void write(List data){
try{
Arraylist idsToDelete = getIdsToDelete();
String query = "DELETE FROM table1 WHERE id IN ("+ idsToDelete + " )";
mysqlJdbcTemplate.getJdbcTemplate().batchUpdate(query);
} catch (Exception e) {
LOG.error(e);
}
}
and myJdbcTemplate is just an abstract class that extends JdbcDaoSupport.
First of all your first simple delete query in which you are passing ids, should not create problem if you are passing ids till a limit like 1000 (total no of rows in child table also should be near about but not to many like 10,000 etc.), but if you are passing like 50,000 or more then it can create locking issue.
To avoid deadlock, you can follow below approach to take care this issue (assuming bulk deletion will not be part of production system)-
Step1: Fetch all ids by select query and keep in cursor.
Step2: now delete these ids stored in cursor in a stored procedure one by one.
Note: To check why deletion is acquiring locks we have to check several things like how many ids you are passing, what is transaction level set at DB level, what is your Mysql configuration setting in my.cnf etc...
It may be dangereous to delete many (> 10000) parent records each having child records deleted by cascade, because the most records you delete in a single time, the most chances of lock conflict leading to deadlock or rollback.
If it is acceptable (meaning you can make a direct JDBC connection to the database) you should (no threading involved here) :
compute the list of ids to delete
delete them by batches (between 10 and 100 a priori) committing every 100 or 1000 records
As the heavier job should be on database part, I hardly doubt that threading will help here. If you want to try it, I would recommend :
one single thread (with a dedicated database connection) computing the list of ids to delete and alimenting a synchronized queue with them
a small number of threads (4 maybe 8), each with its own database connection that :
use a prepared DELETE FROM table1 WHERE id = ? in batches
take ids from the queue and prepare the batches
send a batch to the database every 10 or 100 records
do a commit every 10 or 100 batches
I cannot imagine that the whole process could take more than several minutes.
After some other readings, it looks like I was used to old systems and that my numbers are really conservative.
Ok here's what I did, it might not actually avoid having Deadlocks but was my only option at time being.
This solution is actually a way of handling MySQL Deadlocks using Spring.
Catch and retry Deadlocks:
public void write(List data){
try{
Arraylist idsToDelete = getIdsToDelete();
String query = "DELETE FROM table1 WHERE id IN ("+ idsToDelete + " )";
try {
mysqlJdbcTemplate.getJdbcTemplate().batchUpdate(query);
} catch (org.springframework.dao.DeadlockLoserDataAccessException e) {
LOG.info("Caught DEADLOCK : " + e);
retryDeadlock(query); // Retry them!
}
} catch (Exception e) {
LOG.error(e);
}
}
public void retryDeadlock(final String[] sqlQuery) {
RetryTemplate template = new RetryTemplate();
TimeoutRetryPolicy policy = new TimeoutRetryPolicy();
policy.setTimeout(30000L);
template.setRetryPolicy(policy);
try {
template.execute(new RetryCallback<int[]>() {
public int[] doWithRetry(RetryContext context) {
LOG.info("Retrying DEADLOCK " + context);
return mysqlJdbcTemplate.getJdbcTemplate().batchUpdate(sqlQuery);
}
});
} catch (Exception e1) {
e1.printStackTrace();
}
}
Another solution could be to use Spring's multiple step mechanism.
So that the DELETE queries are split into 3 and thus by starting the first step by deleting the blocking column and other steps delete the two other columns respectively.
Step1: Delete id_trg from child table;
Step2: Delete id_src from child table;
Step3: Delete id from parent table;
Of course the last two steps could be merged into 1, but in that case two distinct ItemsWriters would be needed!
Related
I need to delete items from two databases - one internal managed by my team, and another managed by some other team (they hold different, but related data). The constraint is that if one of these deletes from database fail, then the entire operation should be cancelled and rolled back.
Now, I can control and access my own database easily, but not the database managed by the other team. My line of thought is as follows:
delete from my database first (if it fails, abort everything straightaway)
assuming step 1 succeeds, now I call the API from the other team to delete the data on their side as well
if step 2 succeeds, all is good... if it fails, I'll roll back the delete on my database in step 1
In order to achieve step 3, I think I will have to save the data in step 1 in some variables within the function. Roughly speaking...
public void deleteData (String id) {
Optional<var> entityToBeDeleted = getEntity(id);
try{
deleteFromMyDB(id);
} catch (Exception e){
throw e;
}
try{
deleteFromOtherDB(id);
} catch (Exception e){
persistInMyDB(entityToBeDeleted);
throw e;
}
}
Now I am aware that the above code looks horrible. Any guru can give me some advice on how to do this better?
What does it mean if the remote deletion fails? That the deletion should not happen at all?
Can the local deletion fail for a non-transient reason?
A possible solution is:
Create a "pending deletions" table in your database which will contain the keys of records you want to delete.
When you need to delete record, insert a row in this table.
Then delete the record from the remote system.
If this succeeds, delete the "pending deletion" record and the local record, preferably in a single transaction.
Whenever you start your system, check the "pending deletion" table, and delete any records mention from the local and remote systems (I assume that both these operations are idempotent). Then delete the "pending deletion" record.
I have a table that maintains a sequence number that is used as an identifier for multiple tables (multiple invoice tables all the tables are using single sequence).
Whenever i want to insert a new record in invoice table I read the current sequence number from the table and update it with +1.
The problem is when there are multiple requests for new invoice numbers the sequence number returns duplicate numbers.I tried synchronized block but still it returning duplicate values when multiple requests are hitting at same time.
Here is the method to retrieve the sequence number
synchronized public int getSequence(){
Sequence sequence = getCurrentSession().get(Sequence.class,1); //here 1 is the id of the row
int number = sequence.getSequenceNumber();
sequence.setSequenceNumber(number+1);
getCurrentSession().saveOrUpdate(sequence);
return number;
}
Is there something I am missing?
First of all I won't recommend you to use table implementation of the sequence. Explanation why
But if you have to - hibernate knows how to manage it. Take a look
And one more thing. I strongly recommend you to implement synchronization on the data base side. Imagine you have 2 instances of your application connected to the same database instance and working simultaneously.
Using transactions also not worked for me. I tried all the isolations in mysql but nothing helps me. I solved it with below solution.
synchronized public int getSequence(){
Sequence sequence = getCurrentSession().get(Sequence.class,1); //here 1 is the id of the row
int prevNumber = sequence.getSequenceNumber();
Query<Sequence> query = getCurrentSession().createQuery("UPDATE Sequence SET sequenceNumber = :number WHERE sequenceNumber = :prevNumber",Sequence.class);
query.setParameter("number",prevNumber+1);
query.setParameter("prevNumber",prevNumber);
int affectedRows = query.executeUpdate();
if(accectedRows > 0)
return number;
else
throw new Exception();
}
So whenever a conflict happens it will throw an exception.
I have a method that can be described with the following steps:
Insert rows into temporary table 1.
Insert rows into temporary table 2.
Insert (inner join of table 1 + table 2) into temporary table 3.
Select rows of temporary table 3.
The steps are executed sequentially. However, it is a slow method, and I want to parallelize STEP1 and STEP 2, because they are independent. It is important to know that the 3 temporary tables have the clause "ON COMMIT DELETE ROWS" so all the steps must be performed in a single transaction.
private void temporaryTables() {
String st1 = "insert into table1(name) values('joe')";
String st2 = "insert into table2(name) values('foo')";
jdbcTemplate.update(st1);
jdbcTemplate.update(st2);
//Arrays.asList(st1,st2).parallelStream().map(x -> {
// jdbcTemplate.update(x);
//});
//if I use parallel stream and I select both tables, one table is empty.
}
#Transactional
public List<Response> method() {
temporaryTables();
return jdbcTemplate.query(SELECT_TABLE_3, new BeanPropertyRowMapper<>(Response.class));
}
If I uncomment the parallel code, it doesn't work as expected. It only works with the caller thread, the other thread won't execute in the same transaction, and because of that STEP 3 will fail because one temporary table is empty.
I also tried with raw JDBC transactions. However, I can't share the Connection object because it is synchronized.
How can I solve this problem?
We are running batched delete statements on 5 tables using datastax java driver. We noticed that on some occasions, the record doesn't get deleted from one of the tables, instead the non-key fields are being set to null.
We have been using the key columns on this table to do a duplicate check, so when we use the same values to create a new record with the same values, it fails our duplicate check.
IS this expected behavior? if yes, can we override it to be able to physically delete the row(expected behavior) as part o our batch execution?
Also, why is it always happening with just 1(always the same table) out of total 5.
Table definition:CREATE TABLE customer_ks.customer_by_accountnumber (
customerid text,
accountnumber text,
accounttype text,
age int,
name text,
PRIMARY KEY (customerid, accountnumber))
and here is the query I run on this table as part of the batch:
DELETE FROM customer_by_accountnumber WHERE customerid=? AND accountnumber=?
along with deletes on 4 other tables..
try {
LOG.debug("Deleting customer " + customer);
BatchStatement batch = new BatchStatement();
batch.add(customerDeleteStmt.bind(customer.getId());
batch.add(customerByAccountNumberDeleteStmt.bind(customer.getId(), customer.getAcctNum()));
//3 more cleanup stmts in batch..
batch.add(...);
batch.add(...);
batch.add(...);
cassandraTemplate.execute(batch);
LOG.info("Deleted Customer " + customer);
} catch (Exception e) {
LOG.error("Exception in batched Delete.", e);
throw new DatabaseException("Error: CustomerDAO.delete failed. Exception is: " + e.getMessage());
}
UPDATE:
This doesn't seem like a delete issue as I suspected initially.
Upon investigation it turned out that batched delete worked as expected. What caused the anomaly in this table was an update(After the batched delete for the same row) that was trying to update one column with empty string(not null)..In this case cassandra was issuing an insert (instead of what we thought to be no-op) resulting in a new row with null values for non-key columns and empty value to the updated column.
Changed the code to set the column value to be null in update statement, and it fixed our issue.
Please advise how I can mark this as a non-issue/resolved (whichever is appropriate).
thanks.
I am using Apache dbcp for connection pooling and ibatis to do the database transactions with spring support. The scenario that i am trying to workout is:
create BasicDataSource with max initial connection as 5
Create a temp table
Write bulk of records in temp table.
Write the records onto actual table.
Delete the temp table
The issue here is step 2-5 runs in multi threaded mode. Also since i am using connection pooling, i cannot guranatee that sttep 2,3,4,5 will get the same connection object from the pool and hence i see in step 3/4/5 that temp table XYZ not found.
How can i gurantee that i can reuse the same connection accross the 4 operations. Here's the code for step 3 and 4. I am not thinking to use Global temp table.
#Transactional
public final void insertInBulk(final List<Rows> rows) {
getSqlMapClientTemplate().execute(new SqlMapClientCallback<Object>() {
public Object doInSqlMapClient(
SqlMapExecutor exe) throws SQLException {
executor.startBatch();
for (Rows row : rows) {
for (Object row : row.getMultiRows()) {
exe.insert("##TEMPTABLE.insert", row);
}
}
exe.executeBatch();
return null;
}});
}
public void copyValuesToActualTable() {
final Map<String, Object> procInput = new HashMap<String, Object>();
procInputMap.put("tableName", "MYTABLE");
getSqlMapClientTemplate().queryForObject("##TEMPTABLE.NAME", procInput);
}
I am thinking of improving the design further by creating temp table just once when connection is initialised and instead of dropping truncate the table but one for later and will still have issues with step 3 and 4. Reason for temp table is i dont have access (permission) to directly modify the actual table but via temp table.
I would actually create the temp table (step 2) in the main thread, then break the workload of inserting records in to temp table (Step 3 and Step 4) into chunks and spawn thread for each chunk.
JDK 7 provides the ForkJoin for this step that you may be interested.
Once the insertion into temp and actual table is done, then delete the temp table again in the main thread.
In this way, you don't need to ensure that the same connection is being used everywhere. You can use different connection objects to the same database and perform the step 3 & 4 in parallel.
Hope this helps.