I have a couple instances of a J2EE app running in a single WebLogic cluster.
At some point, these apps do a MERGE to insert or update a record into the back-end Oracle database. The MERGE checks to see if a row with a specified primary key is there or not. If it's there, update. If not, insert.
Now suppose two app instances want to insert or update a row with primary key = 100. Suppose the row doesn't exist. During the "check" stage of merge, they both see that the rows not there, so both of them attempt to insert. Then I get a unique key constraint violation.
My question is this: Is there an atomic MERGE in Oracle? I'm looking for something that has a similar effect to INSERT ... FOR UPDATE in PL/SQL except that I can only execute SQL from my apps.
EDIT: I was unclear. I AM using the MERGE statement while this error still occurs. The thing is, only the "modifying" part is atomic, not the whole merge.
This is not a problem with MERGE as such. Rather the issue lies in your application. Consider this stored procedure:
create or replace procedure upsert_t23
( p_id in t23.id%type
, p_name in t23.name%type )
is
cursor c is
select null
from t23
where id = p_id;
dummy varchar2(1);
begin
open c;
fetch c into dummy;
if c%notfound then
insert into t23
values (p_id, p_name);
else
update t23
set name = p_name
where id = p_id;
end if;
end;
So, this is the PL/SQL equivalent of a MERGE on T23. What happens if two sessions call it simultaneously?
SSN1> exec upsert_t23(100, 'FOX IN SOCKS')
SSN2> exec upsert_t23(100, 'MR KNOX')
SSN1 gets there first, finds no matching record and inserts a record. SSN2 gets there second but before SSN1 commits, finds no record, inserts a record and hangs because SSN1 has a lock on the unique index node for 100. When SSN1 commits SSN2 will hurl a DUP_VAL_ON_INDEX violation.
The MERGE statement works in exactly the same way. Both sessions will check on (t23.id = 100), not find it and go down the INSERT branch. The first session will succeed and the second will hurl ORA-00001.
One way to handle this is to introduce pessimistic locking. At the start of the UPSERT_T23 procedure we lock the table:
...
lock table t23 in row shared mode nowait;
open c;
...
Now, SSN1 arrives, grabs the lock and proceeds as before. When SSN2 arrives it can't get the lock, so it fails immediately. Which is frustrating for the second user but at least they are not hanging, plus they know someone else is working on the same record.
There is no syntax for INSERT which is equivalent to SELECT ... FOR UPDATE, because there is nothing to select. And so there is no such syntax for MERGE either. What you need to do is include the LOCK TABLE statement in the program unit which issues the MERGE. Whether this is possible for you depends on the framework you're using.
The MERGE statement in the second session can not "see" the insert that the first session did until that session commits. If you reduce the size of the transactions the probability that this will occur will be reduced.
Or, can you sort or partition your data so that all records of a given primary key will be given to the same session. A simple function like "primary key mod N" should distribute evenly to N sessions.
btw, if two records have the same primary key, the second will overwrite the first. Sounds a little odd.
Yes, and it's called.... MERGE
EDIT: The only way to get this water tight is to insert, catch the dup_val_on_index exception and handle it appropriately (update, or insert other record perhaps). This can easily be done with PL/SQL, but you can't use that.
You're also looking for workarounds. Can you catch the dup_val_on_index in Java and issue an extra UPDATE again?
In pseudo-code:
try {
// MERGE
}
catch (dup_val_on_index) {
// UPDATE
}
I am surprised that MERGE would behave the way you describe, but I haven't used it sufficiently to say whether it should or not.
In any case, you might have the transactions that wish to execute the merge set their isolation level to SERIALIZABLE. I think that may solve your issue.
Related
I have got a table with auto increment primary key. This table is meant to store millions of records and I don't need to delete anything for now. The problem is, when new rows are getting inserted, because of some error, the auto increment key is leaving some gaps in the auto increment ids.. For example, after 5, the next id is 8, leaving the gap of 6 and 7. Result of this is when I count the rows, it results 28000, but the max id is 58000. What can be the reason? I am not deleting anything. And how can I fix this issue.
P.S. I am using insert ignore while inserting records so that it doesn't give error when I try to insert duplicate entry in unique column.
This is by design and will always happen.
Why?
Let's take 2 overlapping transaction that are doing INSERTs
Transaction 1 does an INSERT, gets the value (let's say 42), does more work
Transaction 2 does an INSERT, gets the value 43, does more work
Then
Transaction 1 fails. Rolls back. 42 stays unused
Transaction 2 completes with 43
If consecutive values were guaranteed, every transaction would have to happen one after the other. Not very scalable.
Also see Do Inserted Records Always Receive Contiguous Identity Values (SQL Server but same principle applies)
You can create a trigger to handle the auto increment as:
CREATE DEFINER=`root`#`localhost` TRIGGER `mytable_before_insert` BEFORE INSERT ON `mytable` FOR EACH ROW
BEGIN
SET NEW.id = (SELECT IFNULL(MAX(id), 0) + 1 FROM mytable);;
END
This is a problem in the InnoDB, the storage engine of MySQL.
It really isn't a problem as when you check the docs on “AUTO_INCREMENT Handling in InnoDB” it basically says InnoDB uses a special table to do the auto increments at startup
And the query it uses is something like
SELECT MAX(ai_col) FROM t FOR UPDATE;
This improves concurrency without really having an affect on your data.
To not have this use MyISAM instead of InnoDB as storage engine
Perhaps (I haven't tested this) a solution is to set innodb_autoinc_lock_mode to 0.
According to http://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html this might make things a bit slower (if you perform inserts of multiple rows in a single query) but should remove gaps.
You can try insert like :
insert ignore into table select (select max(id)+1 from table), "value1", "value2" ;
This will try
insert new data with last unused id (not autoincrement)
if in unique fields duplicate entry found ignore it
else insert new data normally
( but this method not support to update fields if duplicate entry found )
I need to insert employee in Employee table What i want is is to avoid duplicate inserts i.e. if thwo thread tries to insert same employee at same time then last transaction
should fail. For example if first_name and hire_date is same for two employees(same employee coming from two threads) then fail the last transaction.
Approach 1:- First approach i can think of put the constraint at column level(like combined unique constraint on first_name and hire_date) or in the query check if
employee exist throw error(i believe it will be possible through PL/SQL)
Approach 2:- Can it be done at java level too like create a method which first check if employee exists then throw error. In that case i need to make the method scynchronized (or
synchronized block) but it will impact performance it will unnecassrily hold other transactions also. Is there a way i can make put the lock(Reentrant lock) or use the synchronized method based on name/hiredate so that only those specific thransaction are put on hold which has same name and hiredate
public void save(Employee emp){
//hibernate api to save
}
I believe Approach 1 should be preferred as its simple and easier to implement. Right ? Even yes, i would like to know if it can be handled efficiently at java level ?
What i want is is to avoid duplicate inserts
and
but it will impact performance it will unnecassrily hold other transactions also
So, you want highly concurrent inserts that guarantee no duplicates.
Whether you do this in Java or in the database, the only way to avoid duplicate inserts is to serialize (or, Java-speak, synchronize). That is, have one transaction wait for another.
The Oracle database will do this automatically for you if you create a PRIMARY KEY or UNIQUE constraint on your key values. Simultaneous inserts that are not duplicates will not interfere or wait for one another. However, if two sessions simultaneously attempt duplicate inserts, the second will wait until the first completes. If the first session completed via COMMIT, then the second transaction will fail with a duplicate key on index violation. If the first session completed via ROLLBACK, the second transaction will complete successfully.
You can do something similar in Java as well, but the problem is you need a locking mechanism that is accessible to all sessions. synchronize and similar alternatives work only if all sessions are running in the same JVM.
Also, in Java, a key to maximizing concurrency and minimizing waits would be to only wait for actual duplicates. You can achieve something close to that by hashing the incoming key values and then synchronzing only on that hash. That is, for example, put 65,536 objects into a list. Then when an insert wants to happen, hash the incoming key values to a number between 1 and 65536. Then get that object from list and synchronize on that. Of course, you can also synchronize on the actual key values, but a hash is usually as good and can be easier to work with, especially if the incoming key values are unwieldly or sensitive.
That all said, this should absolutely all be done in the database using a simple PRIMARY KEY constraint on your table and appropriate error handling.
One of the main reasons of using databases is that they give you consistency.
You are volunteering to put some of that responsibility back into your application. That very much sounds like the wrong approach. Instead, you should study exactly which capabilities your database offers; and try to make "as much use of them as possible".
In that sense you try to fix a problem on the wrong level.
Pseudo Code :
void save (Employee emp){
if(!isEmployeeExist(emp)){
//Hibernate api to save
}
}
boolean isEmployeeExist(Employee emp){
// build and run query for finding the employee
return true; //if employee exists else return false
}
Good question. I would strongly suggest using MERGE (INSERT and UPDATE in single DML) in this case. Let Oracle handle txn and locks. It's best in your case.
You should create Primary Key, Unique constraint (approach 1) regardless of any solution to preserve data integrity.
-- Sample statement
MERGE INTO employees e
USING (SELECT * FROM hr_records) h
ON (e.id = h.emp_id)
WHEN MATCHED THEN
UPDATE SET e.address = h.address
WHEN NOT MATCHED THEN
INSERT (id, address)
VALUES (h.emp_id, h.address);
since the row is not inserted yet, the isolation level such as READ_COMMITED/REPEATABLE_READ will not be applicable on them.
Best is to apply DB constraint(unique) , if that does not exist then in a multi node setup
you can't achive it thru java locks as well. As request can go to any node.
So, in that case we need to have distributed lock kind of functionality.
We can create a table lock where we can define for each table only one/or collection of insertion is possible at a node.
Ex:
Table_Name, Lock_Acquired
emp, 'N'
no any code can get READ_COMMITED on this row and try to update Lock_acuired to 'Y'
so , any other code in other thread or other node wont be able to proceed further and lock will be given only when the previous lock has been released.
This will make sure highly concurrent system which can avoid duplication, however this will suffer from scalibiliy issue. So decide accordingly what you want to achive.
I have a table like (id INTEGER, sometext VARCHAR(255), ....) with id as the primary key and a UNIQUE constraint on sometext. It gets used in a web server, where a request needs to find the id corresponding to a given sometext if it exists, otherwise a new row gets inserted.
This is the only operation on this table. There are no updates and no other operations on this table. Its sole purpose is to persistently number of encountered values of sometext. This means that I can't drop the id and use sometext as the PK.
I do the following:
First, I consult my own cache in order to avoid any DB access. Nearly always, this works and I'm done.
Otherwise, I use Hibernate Criteria to find the row by sometext. Usually, this works and again, I'm done.
Otherwise, I need to insert a new row.
This works fine, except when there are two overlapping requests with the same sometext. Then an ConstraintViolationException results. I'd need something like INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE (Mysql syntax) or MERGE (Firebird syntax).
I wonder what are the options?
AFAIK Hibernate merge works on PK only, so it's inappropriate. I guess, a native query might help or not, as it may or may not be committed when the second INSERT takes place.
Just let the database handle the concurrency. Start a secondary transaction purely for inserting the new row. if it fails with a ConstraintViolationException, just roll that transaction back and read the new row.
Not sure this scales well if the likelihood of a duplicate is high, a lot of extra work if some percent (depends on database) of transactions have to fail the insert and then reselect.
A secondary transaction minimizes the length of time the transaction to add the new text takes, assuming the database supports it correctly, it might be possible for the thread 1 transaction to cause the thread 2 select/insert to hang until the thread 1 transaction is committed or rolled back. Overall database design might also affect transaction throughput.
I don't necessarily question why sometext can't be a PK, wondering why you need to break it out at all. Of course, large volumes might substantially save space if sometext records are large, it almost seems like you're trying to emulate a lucene index to give you a complete list of text values.
I have searched the web for simple examples to this but to no avail. I need to run a select and insert operation as an atomic unit in Java, using JDBC against an Oracle database.
Effectively I need to do the following:
Select code from users
Go through all codes until I find one that is not used (as users can be deleted there may be codes available in the middle of the range)
Insert new user with that available code
This is a simple operation normally, but as my application is multi-threaded I'm not sure how to go about this. As concurrent threads running at the same time could both try and insert using the same value for code.
There are a couple workarounds or hacks that I can think of to do the job but in general how can I lock the table to make this operation atomic? Most of what I've seen involves row locks but as I'm not updating I don't see how this applies.
This is a tough problem to do entirely in SQL. Any solution is going to have race condition problems. If I was going to do it entirely in SQL I'd use a deleted code table. When users then get deleted you'd use some service to add their code to the deleted table. If the deleted code table is empty threads would use a sequence number to get their new code. Getting a code from the deleted would need to be in a synchronized block because of the get and then set nature with multiple SQL operations. I don't think SQL transactions are going to help there. They may keep the data consistent but if two threads use the same code then one of the two commits is going to throw an exception.
I think a better, and faster, mechanism would be to have a separate thread manage these deleted codes. It could write it in a database but also keep a BlockingQueue of deleted codes for the other threads to consume. If there must be no holes and you are worried about crashing then it will need to validate the list of available holes by querying the user table at launch. It would not need to synchronize or do any SQL transactions because only it would be deleting from the deleted code table.
Hope this helps.
I would lean toward putting the logic in a stored procedure. Use "select for update" to lock, then commit to unlock.
You can add a filter to your insert statement and retry logic on the client end, I guess:
determine an available code (proposed code)
perform the insert with a filter determine the number of rows from the executeUpdate result (0 means a concurrent thread grabbed this code, try again)
The insert would look something along these lines where 3 is your new id, 'Joe' your new user, and proposedCode the one you think is available:
INSERT INTO users
SELECT 3, :proposedCode, 'Joe'
FROM dual
WHERE :proposedCode NOT IN (SELECT code FROM users)
How about:
insert into usertable (
id,
code,
name
) values (
user_id_sequence.nextval,
(
select min(newcode)
from usertable, (
select level newcode
from dual
connect by level <= (select max(code)+1 from usertable))
where not exists (select 1 from usertable where code = newcode)
),
'mynewusername'
)
EDIT:
changed to max(code) + 1, so if there is no gap available, there is a new code available.
I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.
If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194
If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.
The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.
MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1
Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.