I need to delete items from two databases - one internal managed by my team, and another managed by some other team (they hold different, but related data). The constraint is that if one of these deletes from database fail, then the entire operation should be cancelled and rolled back.
Now, I can control and access my own database easily, but not the database managed by the other team. My line of thought is as follows:
delete from my database first (if it fails, abort everything straightaway)
assuming step 1 succeeds, now I call the API from the other team to delete the data on their side as well
if step 2 succeeds, all is good... if it fails, I'll roll back the delete on my database in step 1
In order to achieve step 3, I think I will have to save the data in step 1 in some variables within the function. Roughly speaking...
public void deleteData (String id) {
Optional<var> entityToBeDeleted = getEntity(id);
try{
deleteFromMyDB(id);
} catch (Exception e){
throw e;
}
try{
deleteFromOtherDB(id);
} catch (Exception e){
persistInMyDB(entityToBeDeleted);
throw e;
}
}
Now I am aware that the above code looks horrible. Any guru can give me some advice on how to do this better?
What does it mean if the remote deletion fails? That the deletion should not happen at all?
Can the local deletion fail for a non-transient reason?
A possible solution is:
Create a "pending deletions" table in your database which will contain the keys of records you want to delete.
When you need to delete record, insert a row in this table.
Then delete the record from the remote system.
If this succeeds, delete the "pending deletion" record and the local record, preferably in a single transaction.
Whenever you start your system, check the "pending deletion" table, and delete any records mention from the local and remote systems (I assume that both these operations are idempotent). Then delete the "pending deletion" record.
Related
I'm looking for a way to cancel a failed insert, using Hibernate.
Context : I've got a program which has to format and then transfer data from a source database to a destination Oracle database. Since i've got a lot of data to process, I want to be able to insert in bulks (ex: 100 rows bulks). But the thing is, sometimes an insert could fail because of a bad format (typically, trying to insert a 50 characters long string in a field that can only take up to 32). I could bypass the problem by checking first if the row is valid before trying to insert it, but I'm looking for another way to do it.
I tried to do something like this :
List<MyDataObject> dataList=processData();
HibernateUtils myUtils=HibernateUtils.getInstance();
myUtils.openTransaction(); //opens the transaction so it is not automatically committed after every insert
int i=0;
for(MyDataObject data:dataList){
myUtils.setSavepoint(); //Creates a savepoint
try{
myUtils.insertData(data); //Does not commit, but persists the data object into the DB
myUtils.flush();
} catch (RuntimeException e){
myUtils.rollbackSavepoint(); //Rolls back to the savepoint I created right before inserting the last element
myUtils.commitTransaction();
i=0;
continue;
}
if(++i==100){
myUtils.commitTransaction();
i=0;
}
}
myUtils.closeTransaction();
However, it doesn't work because the unflushed, failed insert will not be rolled back even though I rolled back to the savepoint I created before inserting, probably because it wasn't actually flushed in the first place (because flushing throws an error because of the bad format).
My savepoint rollback is working, if I throw a "fake" runtimeException after inserting some element, this last element won't be in the database
How can I bypass the problem ? (I'd like a way to delete the unflushed SQL instructions while keeping the flushed ones in the transaction)
Thank you in advance for any help
I am working on a monitoring tool developed in Spring Boot using Hibernate as ORM.
I need to compare each row (already persisted rows of sent messages) in my table and see if a MailId (unique) has received a feedback (status: OPENED, BOUNCED, DELIVERED...) Yes or Not.
I get the feedbacks by reading csv files from a network folder. The CSV parsing and reading of files goes very fast, but the update of my database is very slow. My algorithm is not very efficient because I loop trough a list that can have hundred thousands of objects and look in my table.
This is the method that make the update in my table by updating the "target" Object (row in table database)
#Override
public void updateTargetObjectFoo() throws CSVProcessingException, FileNotFoundException {
// Here I make a call to performProcessing method which reads files on a folder and parse them to JavaObjects and I map them in a feedBackList of type Foo
List<Foo> feedBackList = performProcessing(env.getProperty("foo_in"), EXPECTED_HEADER_FIELDS_STATUS, Foo.class, ".LETTERS.STATUS.");
for (Foo foo: feedBackList) {
//findByKey does a simple Select in mySql where MailId = foo.getMailId()
Foo persistedFoo = fooDao.findByKey(foo.getMailId());
if (persistedFoo != null) {
persistedFoo.setStatus(foo.getStatus());
persistedFoo.setDnsCode(foo.getDnsCode());
persistedFoo.setReturnDate(foo.getReturnDate());
persistedFoo.setReturnTime(foo.getReturnTime());
//The save account here does an MySql UPDATE on the table
fooDao.saveAccount(foo);
}
}
}
What if I achieve this selection/comparison and update action in Java side? Then re-update the whole list in database?
Will it be faster?
Thanks to all for your help.
Hibernate is not particularly well-suited for batch processing.
You may be better off using Spring's JdbcTemplate to do jdbc batch processing.
However, if you must do this via Hibernate, this may help: https://docs.jboss.org/hibernate/orm/5.2/userguide/html_single/chapters/batch/Batching.html
The scenario is simple.
I have a somehow large MySQL db containing two tables:
-- Table 1
id (primary key) | some other columns without constraints
-----------------+--------------------------------------
1 | foo
2 | bar
3 | foobar
... | ...
-- Table 2
id_src | id_trg | some other columns without constraints
-------+--------+---------------------------------------
1 | 2 | ...
1 | 3 | ...
2 | 1 | ...
2 | 3 | ...
2 | 5 | ...
...
On table1 only id is a primary key. This table contains about 12M entries.
On table2 id_src and id_trg are both primary keys and both have foreign key constraints on table1's id and they also have the option DELETE ON CASCADE enabled. This table contains about 110M entries.
Ok, now what I'm doing is only to create a list of ids that I want to remove from table 1 and then I'm executing a simple DELETE FROM table1 WHERE id IN (<the list of ids>);
The latter process is as you may have guessed would delete the corresponding id from table2 as well. So far so good, but the problem is that when I run this on a multi-threaded env and I get many Deadlocks!
A few notes:
There is no other process running at the same time nor will be (for the time being)
I want this to be fast! I have about 24 threads (if this does make any difference in the answer)
I have already tried almost all of transaction isolation levels (except the TRANSACTION_NONE) Java sql connection transaction isolation
Ordering/sorting the id's I think would not help!
I have already tried SELECT ... FOR UPDATE, but a simple DELETE would take up to 30secs! (so there is no use of using it) :
DELETE FROM table1
WHERE id IN (
SELECT id FROM (
SELECT * FROM table1
WHERE id='some_id'
FOR UPDATE) AS x);
How can I fix this?
I would appreciate any help and thanks in advance :)
Edit:
Using InnoDB engine
On a single thread this process would take a dozen hours even maybe a whole day, but I'm aiming for a few hours!
I'm already using a connection pool manager: java.util.concurrent
For explanation on double nested SELECTs please refer to MySQL can’t specify target table for update in FROM clause
The list that is to be deleted from DB, may contain a couple of million entries in total which is divided into chunks of 200
The FOR UPDATE clause is that I've heard that it locks a single row instead of locking the whole table
The app uses Spring's batchUpdate(String sqlQuery) method, thus the transactions are managed automatically
All ids have index enabled and the ids are unique 50 chars max!
DELETE ON CASCADE on id_src and id_trg (each separately) would mean that every delete on table1 id=x would lead to deletes on table2 id_src=x and id_trg=x
Some code as requested:
public void write(List data){
try{
Arraylist idsToDelete = getIdsToDelete();
String query = "DELETE FROM table1 WHERE id IN ("+ idsToDelete + " )";
mysqlJdbcTemplate.getJdbcTemplate().batchUpdate(query);
} catch (Exception e) {
LOG.error(e);
}
}
and myJdbcTemplate is just an abstract class that extends JdbcDaoSupport.
First of all your first simple delete query in which you are passing ids, should not create problem if you are passing ids till a limit like 1000 (total no of rows in child table also should be near about but not to many like 10,000 etc.), but if you are passing like 50,000 or more then it can create locking issue.
To avoid deadlock, you can follow below approach to take care this issue (assuming bulk deletion will not be part of production system)-
Step1: Fetch all ids by select query and keep in cursor.
Step2: now delete these ids stored in cursor in a stored procedure one by one.
Note: To check why deletion is acquiring locks we have to check several things like how many ids you are passing, what is transaction level set at DB level, what is your Mysql configuration setting in my.cnf etc...
It may be dangereous to delete many (> 10000) parent records each having child records deleted by cascade, because the most records you delete in a single time, the most chances of lock conflict leading to deadlock or rollback.
If it is acceptable (meaning you can make a direct JDBC connection to the database) you should (no threading involved here) :
compute the list of ids to delete
delete them by batches (between 10 and 100 a priori) committing every 100 or 1000 records
As the heavier job should be on database part, I hardly doubt that threading will help here. If you want to try it, I would recommend :
one single thread (with a dedicated database connection) computing the list of ids to delete and alimenting a synchronized queue with them
a small number of threads (4 maybe 8), each with its own database connection that :
use a prepared DELETE FROM table1 WHERE id = ? in batches
take ids from the queue and prepare the batches
send a batch to the database every 10 or 100 records
do a commit every 10 or 100 batches
I cannot imagine that the whole process could take more than several minutes.
After some other readings, it looks like I was used to old systems and that my numbers are really conservative.
Ok here's what I did, it might not actually avoid having Deadlocks but was my only option at time being.
This solution is actually a way of handling MySQL Deadlocks using Spring.
Catch and retry Deadlocks:
public void write(List data){
try{
Arraylist idsToDelete = getIdsToDelete();
String query = "DELETE FROM table1 WHERE id IN ("+ idsToDelete + " )";
try {
mysqlJdbcTemplate.getJdbcTemplate().batchUpdate(query);
} catch (org.springframework.dao.DeadlockLoserDataAccessException e) {
LOG.info("Caught DEADLOCK : " + e);
retryDeadlock(query); // Retry them!
}
} catch (Exception e) {
LOG.error(e);
}
}
public void retryDeadlock(final String[] sqlQuery) {
RetryTemplate template = new RetryTemplate();
TimeoutRetryPolicy policy = new TimeoutRetryPolicy();
policy.setTimeout(30000L);
template.setRetryPolicy(policy);
try {
template.execute(new RetryCallback<int[]>() {
public int[] doWithRetry(RetryContext context) {
LOG.info("Retrying DEADLOCK " + context);
return mysqlJdbcTemplate.getJdbcTemplate().batchUpdate(sqlQuery);
}
});
} catch (Exception e1) {
e1.printStackTrace();
}
}
Another solution could be to use Spring's multiple step mechanism.
So that the DELETE queries are split into 3 and thus by starting the first step by deleting the blocking column and other steps delete the two other columns respectively.
Step1: Delete id_trg from child table;
Step2: Delete id_src from child table;
Step3: Delete id from parent table;
Of course the last two steps could be merged into 1, but in that case two distinct ItemsWriters would be needed!
I am developing a client in Java. It communicates with the server via actions. Actions are social-like actions (an example of a action is a user views the profile of another user).
With the View Profile example above, the client executes 4 queries to get the data from the database server. To provide consistency, I want to put the 4 queries in a transaction. So in my View Profile function, first I put conn.setAutoCommit(false), then queries the data, and at the end before return I set auto commit to true again conn.setAutoCommit(true) (see the code snippet below).
try {
// set auto commit to false to manually handle transaction
conn.setAutoCommit(false);
// execute query 1
// ...
// execute query 2
// ...
// execute query 3
// ...
// execute query 4
// ...
// set auto commit to true again to not affect other actions
conn.setAutoCommit(true);
} catch (SQLException e) {
e.printStackTrace(System.out);
} finally {
try {
conn.close();
} catch (SQLException e) {
e.printStackTrace(System.out);
}
}
However, when I run the code, sometimes I notice that the data returned from this action is not consistent. When I tries to combine the 4 queries in a single query, I can achieve consistency.
My question is, does setting autoCommit in Java really work with read transaction like in my example, when I want to issue separate queries to the DBMS? If not, how can I provide consistency if I want to query the DBMS in 4 separate queries?
FYI, the database server I use is Oracle DB.
For oracle, selects never do dirty reads, so are always implicitly TRANSACTION_READ_COMMITTED. If you ate ingesting data at a high rate, my guess is that data is changing between the first and last select, so your best bet would be to combine the selects into one using 3 UNIONs.
See http://www.oracle.com/technetwork/issue-archive/2005/05-nov/o65asktom-082389.html
This question is related to my other question
I am building a Spring web application which reads data from DB using hibernate. My App will not be aware of any changes(Updates/Inserts) done to the DB. Is there a way to use query cache in such a scenario?
I configured query cache, and it is not invalidating the cache when I update the DB from different App. And I think it is the expected behavior.
I need the queries to be cached and invalidated when there is an update in DB. How to achieve this?
I am not sure is there any automatic way for refreshing the cache. But i have solved this problem in my last project. Expose a method like below and give access to admin. Once any modification done in DB externally call this method to refresh your cache.
public void refreshCache()
{
try {
Map<String, ClassMetadata> classesMetadata = sessionFactory.getAllClassMetadata();
for (String entityName : classesMetadata.keySet()) {
sessionFactory.evictEntity(entityName);
}
} catch (Exception e) {
e.printStackTrace();
}
}
Well if you are using Oracle , the following command will give you the last updated unique scn on the table
select max(ora_rowscn) from TableName;
output
10772982279880
further you convert this to timestamp if you want
select scn_to_timestamp(10772982279880) from dual
but idont think you need to convert it into time , just cache the the rowscn alone and periodically check the table , if there is a change you can evict the cache regions.
Please note that this supports version > 10g