Cassandra Delete setting non-key fields to null - java

We are running batched delete statements on 5 tables using datastax java driver. We noticed that on some occasions, the record doesn't get deleted from one of the tables, instead the non-key fields are being set to null.
We have been using the key columns on this table to do a duplicate check, so when we use the same values to create a new record with the same values, it fails our duplicate check.
IS this expected behavior? if yes, can we override it to be able to physically delete the row(expected behavior) as part o our batch execution?
Also, why is it always happening with just 1(always the same table) out of total 5.
Table definition:CREATE TABLE customer_ks.customer_by_accountnumber (
customerid text,
accountnumber text,
accounttype text,
age int,
name text,
PRIMARY KEY (customerid, accountnumber))
and here is the query I run on this table as part of the batch:
DELETE FROM customer_by_accountnumber WHERE customerid=? AND accountnumber=?
along with deletes on 4 other tables..
try {
LOG.debug("Deleting customer " + customer);
BatchStatement batch = new BatchStatement();
batch.add(customerDeleteStmt.bind(customer.getId());
batch.add(customerByAccountNumberDeleteStmt.bind(customer.getId(), customer.getAcctNum()));
//3 more cleanup stmts in batch..
batch.add(...);
batch.add(...);
batch.add(...);
cassandraTemplate.execute(batch);
LOG.info("Deleted Customer " + customer);
} catch (Exception e) {
LOG.error("Exception in batched Delete.", e);
throw new DatabaseException("Error: CustomerDAO.delete failed. Exception is: " + e.getMessage());
}
UPDATE:
This doesn't seem like a delete issue as I suspected initially.
Upon investigation it turned out that batched delete worked as expected. What caused the anomaly in this table was an update(After the batched delete for the same row) that was trying to update one column with empty string(not null)..In this case cassandra was issuing an insert (instead of what we thought to be no-op) resulting in a new row with null values for non-key columns and empty value to the updated column.
Changed the code to set the column value to be null in update statement, and it fixed our issue.
Please advise how I can mark this as a non-issue/resolved (whichever is appropriate).
thanks.

Related

If insert statement gives duplicate key exception(row id=1 found in table) how to update the statement in JDBC(Postgresql)

I have Stored bunch of insert statements in ArrayList.like below
List<String> script=new ArrayList<String>;
script.add("INSERT INTO PUBLIC.EMPLOYEE(ID, NAME) VALUES (1, 'Madhava'));
script.add(INSERT INTO PUBLIC.EMPLOYEE(ID, NAME) VALUES (2, 'Rao'));
script.add(INSERT INTO PUBLIC.ADDRESS(ID, CITY) VALUES(1, 'Bangalore'))
script.add(INSERT INTO PUBLIC.ADDRESS(ID, CITY) VALUES(2, 'Hyd'));
I created connection to the postgresql using jdbc i get executed statments using for loop like below
try{
Connection con=DBConnections.getPostgresConnection();
Statment statment=con.createStatment();
for(String query:script){
executeUpdate(query);
}
}catch(Exception e){
e.printStackTrace();
}
If i get duplication key exception(i.e.Already record exist in postgresDB).
org.postgresql.util.PSQLException: ERROR: duplicate key value
violates unique constraint "reports_uniqueness_index"
How to update the same statment(record) with update query into Postgres.
Is there any way to solve this ?
Is there any other better way to solve this?
Could you please explain...
Execute update sends a DML statement over to the database. Your database must already have a record which uses one of the primary keys either in the employees or address table.
You have to ensure you don't violate the primary key constraint. Violating the constraint is resulting in the exception.
Either change your query to an update statement, or delete the records which are causing conflict.
There is no way to get the key that caused the exception (though you can probably parse the error message, which is certainly not recommended).
Instead, you should try preventing this from ever happenning. There are at least 3 easy ways to accomplish this.
Make the database update the column
(in Postgresql you should use a serial type (which is basically an int data type)
CREATE TABLE employee
(
id serial NOT NULL,
--other columns here )
Your insert will now look like
script.add("INSERT INTO PUBLIC.EMPLOYEE(NAME) VALUES ('Madhava'));//no ID here
Create a sequence and have your JDBC code call the sequence' nexval method.
script.add("INSERT INTO PUBLIC.EMPLOYEE(ID, NAME) VALUES (YOUR_SEQ_NAME.NEXTVAL(), 'Madhava'));
Create a unique ID in Java (least recommended)
script.add("INSERT INTO PUBLIC.EMPLOYEE(ID, NAME) VALUES (UUID.random(), 'Madhava'));//or Math.random() etc

How to find the primary key of deleted records with JDBC.?

I want to know how to find the primary key of the deleted records through the JDBC connection.
If my query like following then what will be the primary key of deleted records?
String sqlDelete = "DELETE FROM devicesequences WHERE deviceId = 20;
What is the primary key of the deleted record?
You need a second query. And of course you need to execute it before running the DELETE query.
SELECT id FROM devicesequences WHERE deviceId = 20;
assuming that id is the name of your primary key column.
Since you can delete any arbitrary row (or set of rows) at any time: no, there's no way to tell WHICH row (or rows) you most recently deleted.
You can, however, create a "trigger" to save this information for you.
http://dev.mysql.com/doc/refman/5.0/en/triggers.html

Avoiding MySQL Deadlocks in a multithreaded Spring app

The scenario is simple.
I have a somehow large MySQL db containing two tables:
-- Table 1
id (primary key) | some other columns without constraints
-----------------+--------------------------------------
1 | foo
2 | bar
3 | foobar
... | ...
-- Table 2
id_src | id_trg | some other columns without constraints
-------+--------+---------------------------------------
1 | 2 | ...
1 | 3 | ...
2 | 1 | ...
2 | 3 | ...
2 | 5 | ...
...
On table1 only id is a primary key. This table contains about 12M entries.
On table2 id_src and id_trg are both primary keys and both have foreign key constraints on table1's id and they also have the option DELETE ON CASCADE enabled. This table contains about 110M entries.
Ok, now what I'm doing is only to create a list of ids that I want to remove from table 1 and then I'm executing a simple DELETE FROM table1 WHERE id IN (<the list of ids>);
The latter process is as you may have guessed would delete the corresponding id from table2 as well. So far so good, but the problem is that when I run this on a multi-threaded env and I get many Deadlocks!
A few notes:
There is no other process running at the same time nor will be (for the time being)
I want this to be fast! I have about 24 threads (if this does make any difference in the answer)
I have already tried almost all of transaction isolation levels (except the TRANSACTION_NONE) Java sql connection transaction isolation
Ordering/sorting the id's I think would not help!
I have already tried SELECT ... FOR UPDATE, but a simple DELETE would take up to 30secs! (so there is no use of using it) :
DELETE FROM table1
WHERE id IN (
SELECT id FROM (
SELECT * FROM table1
WHERE id='some_id'
FOR UPDATE) AS x);
How can I fix this?
I would appreciate any help and thanks in advance :)
Edit:
Using InnoDB engine
On a single thread this process would take a dozen hours even maybe a whole day, but I'm aiming for a few hours!
I'm already using a connection pool manager: java.util.concurrent
For explanation on double nested SELECTs please refer to MySQL can’t specify target table for update in FROM clause
The list that is to be deleted from DB, may contain a couple of million entries in total which is divided into chunks of 200
The FOR UPDATE clause is that I've heard that it locks a single row instead of locking the whole table
The app uses Spring's batchUpdate(String sqlQuery) method, thus the transactions are managed automatically
All ids have index enabled and the ids are unique 50 chars max!
DELETE ON CASCADE on id_src and id_trg (each separately) would mean that every delete on table1 id=x would lead to deletes on table2 id_src=x and id_trg=x
Some code as requested:
public void write(List data){
try{
Arraylist idsToDelete = getIdsToDelete();
String query = "DELETE FROM table1 WHERE id IN ("+ idsToDelete + " )";
mysqlJdbcTemplate.getJdbcTemplate().batchUpdate(query);
} catch (Exception e) {
LOG.error(e);
}
}
and myJdbcTemplate is just an abstract class that extends JdbcDaoSupport.
First of all your first simple delete query in which you are passing ids, should not create problem if you are passing ids till a limit like 1000 (total no of rows in child table also should be near about but not to many like 10,000 etc.), but if you are passing like 50,000 or more then it can create locking issue.
To avoid deadlock, you can follow below approach to take care this issue (assuming bulk deletion will not be part of production system)-
Step1: Fetch all ids by select query and keep in cursor.
Step2: now delete these ids stored in cursor in a stored procedure one by one.
Note: To check why deletion is acquiring locks we have to check several things like how many ids you are passing, what is transaction level set at DB level, what is your Mysql configuration setting in my.cnf etc...
It may be dangereous to delete many (> 10000) parent records each having child records deleted by cascade, because the most records you delete in a single time, the most chances of lock conflict leading to deadlock or rollback.
If it is acceptable (meaning you can make a direct JDBC connection to the database) you should (no threading involved here) :
compute the list of ids to delete
delete them by batches (between 10 and 100 a priori) committing every 100 or 1000 records
As the heavier job should be on database part, I hardly doubt that threading will help here. If you want to try it, I would recommend :
one single thread (with a dedicated database connection) computing the list of ids to delete and alimenting a synchronized queue with them
a small number of threads (4 maybe 8), each with its own database connection that :
use a prepared DELETE FROM table1 WHERE id = ? in batches
take ids from the queue and prepare the batches
send a batch to the database every 10 or 100 records
do a commit every 10 or 100 batches
I cannot imagine that the whole process could take more than several minutes.
After some other readings, it looks like I was used to old systems and that my numbers are really conservative.
Ok here's what I did, it might not actually avoid having Deadlocks but was my only option at time being.
This solution is actually a way of handling MySQL Deadlocks using Spring.
Catch and retry Deadlocks:
public void write(List data){
try{
Arraylist idsToDelete = getIdsToDelete();
String query = "DELETE FROM table1 WHERE id IN ("+ idsToDelete + " )";
try {
mysqlJdbcTemplate.getJdbcTemplate().batchUpdate(query);
} catch (org.springframework.dao.DeadlockLoserDataAccessException e) {
LOG.info("Caught DEADLOCK : " + e);
retryDeadlock(query); // Retry them!
}
} catch (Exception e) {
LOG.error(e);
}
}
public void retryDeadlock(final String[] sqlQuery) {
RetryTemplate template = new RetryTemplate();
TimeoutRetryPolicy policy = new TimeoutRetryPolicy();
policy.setTimeout(30000L);
template.setRetryPolicy(policy);
try {
template.execute(new RetryCallback<int[]>() {
public int[] doWithRetry(RetryContext context) {
LOG.info("Retrying DEADLOCK " + context);
return mysqlJdbcTemplate.getJdbcTemplate().batchUpdate(sqlQuery);
}
});
} catch (Exception e1) {
e1.printStackTrace();
}
}
Another solution could be to use Spring's multiple step mechanism.
So that the DELETE queries are split into 3 and thus by starting the first step by deleting the blocking column and other steps delete the two other columns respectively.
Step1: Delete id_trg from child table;
Step2: Delete id_src from child table;
Step3: Delete id from parent table;
Of course the last two steps could be merged into 1, but in that case two distinct ItemsWriters would be needed!

Mysql Duplicate entry 'xxxxxxxx' for key(unique) 'xxxxxxxxxx'

I have a problem of updating a row. I have a column called serialNum with varchar(50) not null unique default null
When I get the response data from the partner company, i will update the row according to the unique serial_num (our company's serial num).
Sometimes update failed because of :
Duplicate entry 'xxxxxxxx' for key 'serialNum'
But the value to update is not exists when i search the whole table. It happens sometimes, not always, like about 10 times out of 300.
Why does this happen and how can I solve it?
below is the query i use to update:
String updateQuery = "update phone set serialNum=?, Order_state=?, Balance=? where Serial_num=" + resultSet.get("jno_cli");
PreparedStatement presta = con.prepareStatement(updateQuery);
presta.setString(1, resultSet.get("oid_goodsorder"));
presta.setString(2, "order success");
presta.setFloat(3, Float.valueOf(resultSet.get("leftmoney")));
presta.executeUpdate();
I think the reason is in resultSet.get("oid_goodsorder") where did you get this result? is 'oid_goodsorder' is unique? Did you always updates whole table?
If oid_goodsorder is unique, it is possible to have duplicates in serialNum, because you don't use bulk update, instead you update every record separately, therefore it is possible:
Before:
serialNum=11,22,33,44
oid_goodsorder=44,11,22,33
It tries to update first serialNum to 44, but 44 is exists!
But if you finish all update serialNum will be unique...
If you wants to get error rows you could disable set serialNum is not unique and check table for duplicating serialNum
If you don't have duplicating values try to use bulk update
Java - how to batch database inserts and updates

remove duplicate values while insertion

Hi I am trying to insert values from excel sheet into SQL Database in java. SQL database has already some rows inserted by some other techniques. Now I need to insert new rows from excel sheet and should eliminate the duplicate values which are existed in the database as well as in the excel sheet. For that I write a query like this.
First I inserted the records from excelsheet into SQL database by using insert query
Statement.executeUpdate(("INSERT INTO dbo.Company(CName,DateTimeCreated) values
('"+Cname”' ,'"+ts+"');
Later I deleted the duplicate values using delete query.
String comprows="delete from dbo.Company where Id not in"
+ "(select min(Id) from dbo.Company "
+ "group by CName having count(*)>=1)";
statement3.executeUpdate(comprows);
where Id is autoincremented integer.
but it is not good to do insert and then delete.
How do I know the values are already exist? If it is exist how do I remove during insertion???
You can simply fire a SELECT for the CName first. If a record is found, update else insert a new record.
Edited to add code snippet:
ResultSet rs = Statement.query("SELECT Id from dbo.Company where CNAME = '" +Cname + "'");
if(rs.next()) {
// retrieve ID from rs
// fire an update for this ID
} else {
// insert a new record.
}
Alternatively, if you think that there are already duplicates on your table and you want to remove them as well..
ResultSet rs = Statement.query("SELECT Id from dbo.Company where CNAME = '"+Cname + "'");
List idList = new ArrayList();
while(rs.next()) {
// collect IDs from rs in a collection say idList
}
if(!isList.isempty()) {
//convert the list to a comma seperated string say idsStr
Statement.executeUpdate("DELETE FROM dbo.Company where id in ("+ idsStr + ")");
}
// insert a new record.
Statement.executeUpdate(("INSERT INTO dbo.Company(CName,DateTimeCreated) values('"+Cname”' ,'"+ts+"');
Of course good practice is to use PreparedStatement as it would improve performance.
PS: Excuse me for any syntax errors.
One option would be to create a temp table and dump your Excel data there. Then you can write an insert that joins the temp table with the dbo.Company table and only insert the records that aren't already there.
You could do a lookup on each record you want to insert but if you are dealing with large volumes that's not a super efficient way to do it since you will have to do a select and an insert for each record in you excel spreadsheet.
Merge statements are pretty effective in these types of situations as well. I don't think all databases support them (I know Oracle does for sure). A merge statement is basically a combo insert and update so you can do the look up to the final table and insert if not found and update if found. The nice thing about this is you get the efficiency of doing all of this as a set rather than one record at a time.
If you can control the DB schema, you might consider putting a unique contraint for whatever column(s) to avoid duplicating. When you do your inserts, it'll throw when it tries to add the dup data. Catch it before it tosses you all the way out.
It's usually good to enforce constraints like this on the DB itself; that means no one querying the database has to worry about invalid duplicates. Also, optimistically trying the insert first (without doing a separate select first) might be faster.

Categories