prevent period intersection in concurrent insert - java

I have a table with fields: name|...|start_date|end_date
My code now is:
select .... 'check for period intersection
insert .... 'if check succesfull insert new row
This code in one transaction.
When two users try to insert new record in the same time with same fields(and periods intersects) two records inserted.
But I want to avoid that inserting. First user must insert, other user must get conflict.
How can I do it ?
P.S. I use IBM DB2

Insert query which gets the data from select. In select the values selected will data that need to be inserted. The where clause can check for condition and should return null if check fails. So if I want to enter if id 5 is not in in table than
Insert into test1(val) select "test" from (select case when id = 5 then null else 5 end '1' from sysP where id =5) aa
This query will insert test in table test1 is id =5 is not there in sysP table

You could use an UK or select for update:
select .... 'check for period intersection FOR UPDATE WITH RS USE AND KEEP UPDATE LOCKS
Update:
Try locking the whole table before the select with:
LOCK TABLE TABLE_NAME IN EXCLUSIVE MODE
This way, the second transaction waits for the previous one to commit before select. The EXCLUSIVE MODE locks the select statements too, not only updates and inserts.
Update 2:
If "check for period intersection" uses only column from the same table as the one you're inserting into, then instead of select add a constraint check to your table. See http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=%2Fcom.ibm.db2.udb.admin.doc%2Fdoc%2Ft0004984.htm

Sounds like MERGE is exactly what you want, when combined with some error raising. I'm assuming you're using DB2 on Linux/Unix/Windows, but MERGE has been on the Mainframe DB2 since v9.1 as well.
MERGE INTO YOUR_TABLE YT
USING (
VALUES ('val1', 'val2', 'val3')
) MG(v1, v2, v3)
ON (TY.v1 = MG.v1)
WHEN MATCHED
SIGNAL SQLSTATE '70001'
SET MESSAGE_TEXT = 'Record already exists!'
WHEN NOT MATCHED THEN
INSERT(v1, v2, v3)
VALUES(MG.v1, MG.v2, MG.v3
ELSE IGNORE;
The USING clause can be used with provided values (like I have here), or it could be a sub-select. There are other examples on the Merge page on the Information Center that I linked above.

Related

Why is the row id increased by one (or more) even after one (or more) ConstraintViolationException? [duplicate]

I have got a table with auto increment primary key. This table is meant to store millions of records and I don't need to delete anything for now. The problem is, when new rows are getting inserted, because of some error, the auto increment key is leaving some gaps in the auto increment ids.. For example, after 5, the next id is 8, leaving the gap of 6 and 7. Result of this is when I count the rows, it results 28000, but the max id is 58000. What can be the reason? I am not deleting anything. And how can I fix this issue.
P.S. I am using insert ignore while inserting records so that it doesn't give error when I try to insert duplicate entry in unique column.
This is by design and will always happen.
Why?
Let's take 2 overlapping transaction that are doing INSERTs
Transaction 1 does an INSERT, gets the value (let's say 42), does more work
Transaction 2 does an INSERT, gets the value 43, does more work
Then
Transaction 1 fails. Rolls back. 42 stays unused
Transaction 2 completes with 43
If consecutive values were guaranteed, every transaction would have to happen one after the other. Not very scalable.
Also see Do Inserted Records Always Receive Contiguous Identity Values (SQL Server but same principle applies)
You can create a trigger to handle the auto increment as:
CREATE DEFINER=`root`#`localhost` TRIGGER `mytable_before_insert` BEFORE INSERT ON `mytable` FOR EACH ROW
BEGIN
SET NEW.id = (SELECT IFNULL(MAX(id), 0) + 1 FROM mytable);;
END
This is a problem in the InnoDB, the storage engine of MySQL.
It really isn't a problem as when you check the docs on “AUTO_INCREMENT Handling in InnoDB” it basically says InnoDB uses a special table to do the auto increments at startup
And the query it uses is something like
SELECT MAX(ai_col) FROM t FOR UPDATE;
This improves concurrency without really having an affect on your data.
To not have this use MyISAM instead of InnoDB as storage engine
Perhaps (I haven't tested this) a solution is to set innodb_autoinc_lock_mode to 0.
According to http://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html this might make things a bit slower (if you perform inserts of multiple rows in a single query) but should remove gaps.
You can try insert like :
insert ignore into table select (select max(id)+1 from table), "value1", "value2" ;
This will try
insert new data with last unused id (not autoincrement)
if in unique fields duplicate entry found ignore it
else insert new data normally
( but this method not support to update fields if duplicate entry found )

Get details about SQL updated records

I need suggestions about a logic. There is an update query in application like below
UPDATE TABLE
SET FLAG = CASE
WHEN FLAG = 'IP' THEN 'P'
WHEN FLAG = 'IH' THEN 'H'
WHEN FLAG = 'IM' THEN 'M'
END
WHERE ADJUSTMENT_ID IN (SELECT Query )
This update is executed from a Java function which returns void.
Now I have a requirement to get details of updated records too (few columns from table TABLE) and return a LIST from function instead of void.
Running SELECT first then updating records in loop is not an option due to performance reasons. Records are updated with a single UPDATE statement because its supposed to run faster.
What would be options for me keeping comparable performance? Should I go with a stored procedure?
SELECT ... FROM FINAL TABLE (UPDATE ....)
will do the job. As it is a single SQL statement the performance will aslo be good.
See also
http://www.idug.org/p/bl/et/blogid=278&blogaid=422

Transfer mysql binary log into select in CDC

I would like to do a real time reading from mysql.
The idea is simple. I use the binary log to trigger the select statement.
Meanwhile I'd like to read only the new rows on every change.
And currently I just consider insert.
So when someone do
insert into sometable(uid,somecolumn) values(uid,something)
My code will be triggered and do
select from sometable where uid=uid
Of course I have already written down which columns are the primary key because it seems no information from binlog.
I cannot find a tool to analysis mysql insert statement. So I use the regex to find out which column equals which value, then extract primary keys.
BUT the real problems what will happen if I do
Insert into `table` (`col`) values (select 0 as `col` from `dummy`);
How can I find out the col=0?
Is it impossible that make a select statement that select the new changed rows, triggered by the insert statement?
In a TRIGGER, you have access to the OLD and NEW values. With them, you can write code (in the TRIGGER) to log, for example, just the changes. Something like...
IF NEW.col1 != OLD.col1 THEN INSERT INTO LOG ...; END;
IF NEW.col2 != OLD.col2 THEN INSERT INTO LOG ...; END;

Insert with Subselect - Atomic Operation?

I know, that mysql supports auto-increment values, but no dependent auto-increment values.
i.e. if you have a table like this:
id | element | innerId
1 | a | 1
2 | a | 2
3 | b | 1
And you insert another b-element, you need to compute the innerId on your own, (Excpected insert would be "2")
Is there a database supporting something like this?
What would be the best way to achieve this behaviour? I do not know the number of elements, so i cannot create dedicated tables for them, where I just could derrive an id.
(The example is simplyfied)
The target that should be achieved, is that any element "type" (where the number is unknown, possibly infitine -1 should have it's own, gap-less id.
If I would use something like
INSERT INTO
myTable t1
(id,element, innerId)
VALUES
(null, 'b', (SELECT COUNT(*) FROM myTable t2 WHERE t2.element = "b") +1)
http://sqlfiddle.com/#!2/2f4543/1
Will this return the expected result under all circumstances? I mean it works, but what about concurrency? Are Inserts with SubSelects still atomic or might there be a szenario, where two inserts will try to insert the same id? (Especially if a transactional insert is pending?)
Would it be better to try to achieve this with the programming language (i.e. Java)? Or is it easier to implement this logic as close to the database engine as possible?
Since I'm using an aggregation to compute the next innerId, i think using SELECT...FOR UPDATE can not avoid the problem in case of other transactions having pending commits, right?
ps.: I could ofc. just bruteforce the insert - starting at the current max value per element - with a unique key constraint on (element,innerId) until there is no foreignKey-violation - but isn't there a nicer way?
According to Make one ID with auto_increment depending on another ID - possible? it would be possible with a composite primary key on - in my case - innerId and element. But according to this setting MySQL auto_increment to be dependent on two other primary keys that works only for MyIsam (I have InnoDB)
Now i'm confused even more. I tried to use 2 different php scripts to insert data, using the query above. While script one has a "sleep" for 15 seconds in order to allow me to call script two (which should simulate the concurrent modification) - The result was correct when using one query.
(ps.: mysql(?!i)-functions only for quick debugging)
Base Data:
Script 1:
mysql_query("START TRANSACTION");
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', (SELECT MAX(t2.innerID) FROM insertTest t2 WHERE element='a') +1, 'page1')");
sleep(15);
//mysql_query("ROLLBACK;");
mysql_query("COMMIT;");
Script 2:
//mysql_query("START TRANSACTION");
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', (SELECT MAX(t2.innerID) FROM insertTest t2 WHERE element='a') +1, 'page2')");
//mysql_query("COMMIT;");
I would have expected that the page2 insert would have happened before the page1 insert, cause it's running without any transaction. But in fact, the page1 insert happened FIRST, causing the second script to also be delayed for about 15 seconds...
(ignore the AC-Id, played around a bit)
When using Rollback on the first script, the second script is still delayed for 15 seconds, and then picking up the correct innerId:
So:
Non-Transactional-Insert are blocked while a transaction is active.
Inserts with subselects seem also to be blocked.
So at the end it seems like a Insert with a subselect is an atomic operation? Or why would the SELECT of the second page has been blocked otherwhise?
Using the selection and insert in seperate, non-transactional statements like this (on page 2, simulating the concurrent modification):
$nextId = mysql_query("SELECT MAX(t2.innerID) as a FROM insertTest t2 WHERE element='a'");
$nextId = mysql_fetch_array($nextId);
$nextId = $nextId["a"] +1;
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', $nextId, 'page2')");
leads to the error I was trying to avoid:
so why does It work in the concurrent szenario when each modification is one query? Are inserts with subselects atomic?
Well, all (or almost) all databases support the necessary functionality for calculating innerid according to your rules. It is called a trigger, specifically a before insert trigger.
Your particular version will not work consistently in a multi-user environment. Few, if any, databases generate read locks on a table when starting an insert. That means that two insert statements issued very close together would generate the same value for innerid.
Because of concurrency considerations, you should do this calculation in the database, using triggers rather than on the application side.
You always have the possibility of calculating innerid when you need it, rather than when you insert the value. This is computationally expensive, requiring either an order by (using variables) or a correlated subquery. Other databases support window/analytic functions, making such a calculation much easier to express.
From what I read here: Atomicity multiple MySQL subqueries in an INSERT/UPDATE query? your query seems to be atomic. I've tested it on my MySQL with InnoDB with 4 different programs trying to execute the query 100000 times each. After that I was able to create a combined Unique key on (element,innerid) and it worked well, so it didn't seem to generate a duplicate. However I've got:
Deadlock found when trying to get lock
So you might want to consider this http://dev.mysql.com/doc/refman/5.1/en/innodb-deadlocks.html
EDIT: It seems I could circumvent the deadlock by changing the SQL to
INSERT INTO test (id,element,innerId)
VALUES(null, "b",
(SELECT Count(*) FROM test t2 WHERE element = 'b' FOR UPDATE )+1);

Reading and wiring CSV File into database

I am implementing application specific data import feature from one database to another.
I have a CSV file containing say 10000 rows. These rows need to be inserted/updated into database.
I am using mysql database and inserting from Java.
There might be the case, where couple of rows may present in database that means those need to be updated. If not present in database, those need to be inserted.
One possible solution is that, I can read one by one line, check the entry in database and build insert/update queries accordingly. But this process may take much time to create update/insert queries and execute them in database. Some times my CSV file may have millions of records.
Is there any other faster way to achieve this feature?
I don't know how you determine "is already present", but if it's any kind of database level constraint (probably on a primary key?) you can make use of the REPLACE INTO statement, which will create a record unless it gets an error in which case it'll update the record that prevents it from being inserted.
It works just like INSERT basically:
REPLACE INTO table ( id, field1, field2 )
VALUES ( 1, 'value1', 'value'2 )
If a row with ID 1 exists, it's updated with these values; otherwise it's created.
Given that you're using MySQL you could use the INSERT ... ON DUPLICATE KEY UPDATE ... statement, which functions similarly to the SQL standard MERGE statement. MYSQL doc reference here and general Wikipedia reference to SQL MERGE functionality here. The statement would look something like
INSERT INTO MY_TABLE
(PRIMARY_KEY_COL, COL2, COL3, COL4)
VALUES
(1, 2, 3, 4)
ON DUPLICATE KEY
UPDATE COL2 = 2,
COL3 = 3,
COL4 = 4
In this example I'm assuming that PRIMARY_KEY_COL is a primary or unique key on MY_TABLE. If the INSERT statement would fail due to a duplicate value on the primary or unique key then the UPDATE clause is executed. Also note (on the MySQL doc page) that there are some gotcha's associated with auto-increment columns on an InnoDB table.
Share and enjoy.
Do you need to do this often or just once in a while?
I need to load csv files from time to time to a database for analysis and I created a SSIS-Datasolution with a Data Flow task which loads the csv-File into a table on the SQL Server.
For more infos look at this blog
http://blog.sqlauthority.com/2011/05/12/sql-server-import-csv-file-into-database-table-using-ssis/
Add a stored procedure in SQL for inserting. In the stored procedure use a try catch block to do the insert. If the insert fails do an update. Then you can simply call this method from your program.
Alternatively:
UPDATE Table1 SET (...) WHERE Column1='SomeValue'
IF ##ROWCOUNT=0
INSERT INTO Table1 VALUES (...)

Categories