Insert with Subselect - Atomic Operation? - java

I know, that mysql supports auto-increment values, but no dependent auto-increment values.
i.e. if you have a table like this:
id | element | innerId
1 | a | 1
2 | a | 2
3 | b | 1
And you insert another b-element, you need to compute the innerId on your own, (Excpected insert would be "2")
Is there a database supporting something like this?
What would be the best way to achieve this behaviour? I do not know the number of elements, so i cannot create dedicated tables for them, where I just could derrive an id.
(The example is simplyfied)
The target that should be achieved, is that any element "type" (where the number is unknown, possibly infitine -1 should have it's own, gap-less id.
If I would use something like
INSERT INTO
myTable t1
(id,element, innerId)
VALUES
(null, 'b', (SELECT COUNT(*) FROM myTable t2 WHERE t2.element = "b") +1)
http://sqlfiddle.com/#!2/2f4543/1
Will this return the expected result under all circumstances? I mean it works, but what about concurrency? Are Inserts with SubSelects still atomic or might there be a szenario, where two inserts will try to insert the same id? (Especially if a transactional insert is pending?)
Would it be better to try to achieve this with the programming language (i.e. Java)? Or is it easier to implement this logic as close to the database engine as possible?
Since I'm using an aggregation to compute the next innerId, i think using SELECT...FOR UPDATE can not avoid the problem in case of other transactions having pending commits, right?
ps.: I could ofc. just bruteforce the insert - starting at the current max value per element - with a unique key constraint on (element,innerId) until there is no foreignKey-violation - but isn't there a nicer way?
According to Make one ID with auto_increment depending on another ID - possible? it would be possible with a composite primary key on - in my case - innerId and element. But according to this setting MySQL auto_increment to be dependent on two other primary keys that works only for MyIsam (I have InnoDB)
Now i'm confused even more. I tried to use 2 different php scripts to insert data, using the query above. While script one has a "sleep" for 15 seconds in order to allow me to call script two (which should simulate the concurrent modification) - The result was correct when using one query.
(ps.: mysql(?!i)-functions only for quick debugging)
Base Data:
Script 1:
mysql_query("START TRANSACTION");
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', (SELECT MAX(t2.innerID) FROM insertTest t2 WHERE element='a') +1, 'page1')");
sleep(15);
//mysql_query("ROLLBACK;");
mysql_query("COMMIT;");
Script 2:
//mysql_query("START TRANSACTION");
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', (SELECT MAX(t2.innerID) FROM insertTest t2 WHERE element='a') +1, 'page2')");
//mysql_query("COMMIT;");
I would have expected that the page2 insert would have happened before the page1 insert, cause it's running without any transaction. But in fact, the page1 insert happened FIRST, causing the second script to also be delayed for about 15 seconds...
(ignore the AC-Id, played around a bit)
When using Rollback on the first script, the second script is still delayed for 15 seconds, and then picking up the correct innerId:
So:
Non-Transactional-Insert are blocked while a transaction is active.
Inserts with subselects seem also to be blocked.
So at the end it seems like a Insert with a subselect is an atomic operation? Or why would the SELECT of the second page has been blocked otherwhise?
Using the selection and insert in seperate, non-transactional statements like this (on page 2, simulating the concurrent modification):
$nextId = mysql_query("SELECT MAX(t2.innerID) as a FROM insertTest t2 WHERE element='a'");
$nextId = mysql_fetch_array($nextId);
$nextId = $nextId["a"] +1;
mysql_query("INSERT INTO insertTest (id, element, innerId, fromPage)VALUES(null, 'a', $nextId, 'page2')");
leads to the error I was trying to avoid:
so why does It work in the concurrent szenario when each modification is one query? Are inserts with subselects atomic?

Well, all (or almost) all databases support the necessary functionality for calculating innerid according to your rules. It is called a trigger, specifically a before insert trigger.
Your particular version will not work consistently in a multi-user environment. Few, if any, databases generate read locks on a table when starting an insert. That means that two insert statements issued very close together would generate the same value for innerid.
Because of concurrency considerations, you should do this calculation in the database, using triggers rather than on the application side.
You always have the possibility of calculating innerid when you need it, rather than when you insert the value. This is computationally expensive, requiring either an order by (using variables) or a correlated subquery. Other databases support window/analytic functions, making such a calculation much easier to express.

From what I read here: Atomicity multiple MySQL subqueries in an INSERT/UPDATE query? your query seems to be atomic. I've tested it on my MySQL with InnoDB with 4 different programs trying to execute the query 100000 times each. After that I was able to create a combined Unique key on (element,innerid) and it worked well, so it didn't seem to generate a duplicate. However I've got:
Deadlock found when trying to get lock
So you might want to consider this http://dev.mysql.com/doc/refman/5.1/en/innodb-deadlocks.html
EDIT: It seems I could circumvent the deadlock by changing the SQL to
INSERT INTO test (id,element,innerId)
VALUES(null, "b",
(SELECT Count(*) FROM test t2 WHERE element = 'b' FOR UPDATE )+1);

Related

Why is the row id increased by one (or more) even after one (or more) ConstraintViolationException? [duplicate]

I have got a table with auto increment primary key. This table is meant to store millions of records and I don't need to delete anything for now. The problem is, when new rows are getting inserted, because of some error, the auto increment key is leaving some gaps in the auto increment ids.. For example, after 5, the next id is 8, leaving the gap of 6 and 7. Result of this is when I count the rows, it results 28000, but the max id is 58000. What can be the reason? I am not deleting anything. And how can I fix this issue.
P.S. I am using insert ignore while inserting records so that it doesn't give error when I try to insert duplicate entry in unique column.
This is by design and will always happen.
Why?
Let's take 2 overlapping transaction that are doing INSERTs
Transaction 1 does an INSERT, gets the value (let's say 42), does more work
Transaction 2 does an INSERT, gets the value 43, does more work
Then
Transaction 1 fails. Rolls back. 42 stays unused
Transaction 2 completes with 43
If consecutive values were guaranteed, every transaction would have to happen one after the other. Not very scalable.
Also see Do Inserted Records Always Receive Contiguous Identity Values (SQL Server but same principle applies)
You can create a trigger to handle the auto increment as:
CREATE DEFINER=`root`#`localhost` TRIGGER `mytable_before_insert` BEFORE INSERT ON `mytable` FOR EACH ROW
BEGIN
SET NEW.id = (SELECT IFNULL(MAX(id), 0) + 1 FROM mytable);;
END
This is a problem in the InnoDB, the storage engine of MySQL.
It really isn't a problem as when you check the docs on “AUTO_INCREMENT Handling in InnoDB” it basically says InnoDB uses a special table to do the auto increments at startup
And the query it uses is something like
SELECT MAX(ai_col) FROM t FOR UPDATE;
This improves concurrency without really having an affect on your data.
To not have this use MyISAM instead of InnoDB as storage engine
Perhaps (I haven't tested this) a solution is to set innodb_autoinc_lock_mode to 0.
According to http://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html this might make things a bit slower (if you perform inserts of multiple rows in a single query) but should remove gaps.
You can try insert like :
insert ignore into table select (select max(id)+1 from table), "value1", "value2" ;
This will try
insert new data with last unused id (not autoincrement)
if in unique fields duplicate entry found ignore it
else insert new data normally
( but this method not support to update fields if duplicate entry found )

In mysql is it advisible to rely on uniqueness constraints or to manually check if the row is already present?

I have a table:
userId | subject
with uniqueness constraint on both combined.
Now I am writing thousands of rows to this table every few minutes. The data stream is coming from a queue and it might repeat. I have to however make sure that there is only one unique combination of userId,subject in the table.
Currently I rely on mysql's uniqueness constraint which throws as exception.
Another approach is run a SELECT count(*) query to check if this row is already present and then skip it if need be.
Since I want to write on an average 4 rows per second what is advisable.
Programming language: Java
EDIT:
Just in case I am not clear the question here is whether relying ton MYSQL to throw an exception is better or running a select query before insert operation is better in terms of performance.
I thought a select query is less CPU/IO intensive than a INSERT query. If I run too many INSERTS wouldn't that create many locks ?
MySQL is ACID and employs transactional locking, so relying on its uniqueness constraints is very standard. Note that you can do this either via PRIMARY KEY or UNIQUE KEY (but favour the former if you can).
A unique constraint is unique for the complete committed dataset.
There a several databases which allows to set "transaction isolation level".
userId subject
A 1
B 2
-------------------------
A 2
A 3
The two rows above the line are committed. Every connection can read these lines. The two line under the line are currently been written within your transaction. Within this connection all four lines are visible.
If another thread / connection / transaction tries to store A-2 there will be an exception in one of the two transaction (the first one can commit the transaction, the second one can't).
Other isolation level may fail earlier. But it is not possible to violate against the Unique-key constraint.

How to Fetch Last 10 database Transaction In IBM DB2..?

I would like to fetch last 10 database transaction in IBM DB2..
Means Which Last 10 transaction execute In DB2..
Depending on what you need that for, you will have to set up the DB2 audit facility or use an activity event monitor.
SQL tables have no implicit ordering, the order has to come from the
data. Perhaps you should add a field to your table (e.g. an int
counter) and re-import the data.
If you cannot do so, then here is one more idea which is coming to my mind, while writing this answer. Can we use rownum to get the last 10 records? Perhaps yes, here is what you can try, i am just throwing this idea and have not tested.
Get the MAX(rownum) from the table
Fetch the records from the table between the max(rownum) to max(rownum) -10
Aghh sounds ugly but see if it works for u.
Btw if you don't know about rowid then here is link to learn about that:
http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.apdv.porting.doc%2Fdoc%2Fr0052875.html
If there is a column in your table that you can use to ascertain the correct order, such h as a transaction number or a value generated by a sequence reference, or some column(s) that you can use to ORDER BY, then simply add DESCENDING after each column in the ORDER BY clause, and FETCH FIRST 10 ROWS.

JBDC - execute SELECT and INSERT atomically across concurrent threads

I have searched the web for simple examples to this but to no avail. I need to run a select and insert operation as an atomic unit in Java, using JDBC against an Oracle database.
Effectively I need to do the following:
Select code from users
Go through all codes until I find one that is not used (as users can be deleted there may be codes available in the middle of the range)
Insert new user with that available code
This is a simple operation normally, but as my application is multi-threaded I'm not sure how to go about this. As concurrent threads running at the same time could both try and insert using the same value for code.
There are a couple workarounds or hacks that I can think of to do the job but in general how can I lock the table to make this operation atomic? Most of what I've seen involves row locks but as I'm not updating I don't see how this applies.
This is a tough problem to do entirely in SQL. Any solution is going to have race condition problems. If I was going to do it entirely in SQL I'd use a deleted code table. When users then get deleted you'd use some service to add their code to the deleted table. If the deleted code table is empty threads would use a sequence number to get their new code. Getting a code from the deleted would need to be in a synchronized block because of the get and then set nature with multiple SQL operations. I don't think SQL transactions are going to help there. They may keep the data consistent but if two threads use the same code then one of the two commits is going to throw an exception.
I think a better, and faster, mechanism would be to have a separate thread manage these deleted codes. It could write it in a database but also keep a BlockingQueue of deleted codes for the other threads to consume. If there must be no holes and you are worried about crashing then it will need to validate the list of available holes by querying the user table at launch. It would not need to synchronize or do any SQL transactions because only it would be deleting from the deleted code table.
Hope this helps.
I would lean toward putting the logic in a stored procedure. Use "select for update" to lock, then commit to unlock.
You can add a filter to your insert statement and retry logic on the client end, I guess:
determine an available code (proposed code)
perform the insert with a filter determine the number of rows from the executeUpdate result (0 means a concurrent thread grabbed this code, try again)
The insert would look something along these lines where 3 is your new id, 'Joe' your new user, and proposedCode the one you think is available:
INSERT INTO users
SELECT 3, :proposedCode, 'Joe'
FROM dual
WHERE :proposedCode NOT IN (SELECT code FROM users)
How about:
insert into usertable (
id,
code,
name
) values (
user_id_sequence.nextval,
(
select min(newcode)
from usertable, (
select level newcode
from dual
connect by level <= (select max(code)+1 from usertable))
where not exists (select 1 from usertable where code = newcode)
),
'mynewusername'
)
EDIT:
changed to max(code) + 1, so if there is no gap available, there is a new code available.

Can I do an atomic MERGE in Oracle?

I have a couple instances of a J2EE app running in a single WebLogic cluster.
At some point, these apps do a MERGE to insert or update a record into the back-end Oracle database. The MERGE checks to see if a row with a specified primary key is there or not. If it's there, update. If not, insert.
Now suppose two app instances want to insert or update a row with primary key = 100. Suppose the row doesn't exist. During the "check" stage of merge, they both see that the rows not there, so both of them attempt to insert. Then I get a unique key constraint violation.
My question is this: Is there an atomic MERGE in Oracle? I'm looking for something that has a similar effect to INSERT ... FOR UPDATE in PL/SQL except that I can only execute SQL from my apps.
EDIT: I was unclear. I AM using the MERGE statement while this error still occurs. The thing is, only the "modifying" part is atomic, not the whole merge.
This is not a problem with MERGE as such. Rather the issue lies in your application. Consider this stored procedure:
create or replace procedure upsert_t23
( p_id in t23.id%type
, p_name in t23.name%type )
is
cursor c is
select null
from t23
where id = p_id;
dummy varchar2(1);
begin
open c;
fetch c into dummy;
if c%notfound then
insert into t23
values (p_id, p_name);
else
update t23
set name = p_name
where id = p_id;
end if;
end;
So, this is the PL/SQL equivalent of a MERGE on T23. What happens if two sessions call it simultaneously?
SSN1> exec upsert_t23(100, 'FOX IN SOCKS')
SSN2> exec upsert_t23(100, 'MR KNOX')
SSN1 gets there first, finds no matching record and inserts a record. SSN2 gets there second but before SSN1 commits, finds no record, inserts a record and hangs because SSN1 has a lock on the unique index node for 100. When SSN1 commits SSN2 will hurl a DUP_VAL_ON_INDEX violation.
The MERGE statement works in exactly the same way. Both sessions will check on (t23.id = 100), not find it and go down the INSERT branch. The first session will succeed and the second will hurl ORA-00001.
One way to handle this is to introduce pessimistic locking. At the start of the UPSERT_T23 procedure we lock the table:
...
lock table t23 in row shared mode nowait;
open c;
...
Now, SSN1 arrives, grabs the lock and proceeds as before. When SSN2 arrives it can't get the lock, so it fails immediately. Which is frustrating for the second user but at least they are not hanging, plus they know someone else is working on the same record.
There is no syntax for INSERT which is equivalent to SELECT ... FOR UPDATE, because there is nothing to select. And so there is no such syntax for MERGE either. What you need to do is include the LOCK TABLE statement in the program unit which issues the MERGE. Whether this is possible for you depends on the framework you're using.
The MERGE statement in the second session can not "see" the insert that the first session did until that session commits. If you reduce the size of the transactions the probability that this will occur will be reduced.
Or, can you sort or partition your data so that all records of a given primary key will be given to the same session. A simple function like "primary key mod N" should distribute evenly to N sessions.
btw, if two records have the same primary key, the second will overwrite the first. Sounds a little odd.
Yes, and it's called.... MERGE
EDIT: The only way to get this water tight is to insert, catch the dup_val_on_index exception and handle it appropriately (update, or insert other record perhaps). This can easily be done with PL/SQL, but you can't use that.
You're also looking for workarounds. Can you catch the dup_val_on_index in Java and issue an extra UPDATE again?
In pseudo-code:
try {
// MERGE
}
catch (dup_val_on_index) {
// UPDATE
}
I am surprised that MERGE would behave the way you describe, but I haven't used it sufficiently to say whether it should or not.
In any case, you might have the transactions that wish to execute the merge set their isolation level to SERIALIZABLE. I think that may solve your issue.

Categories