Duplicate key exception on merge statement in DB2 - java

The problem: Everyday we get lots of parts that we want to add to our stock. We get messages over a queue that we read from (using 4 different servers). The queue always contains elements so the servers read as fast as they can. We want the servers to simply update the article if the article exits, and insert it if it doesn't.
Our first, naive solution was simply to select to see if the article existed, and if it didn't we wanted to insert. However since there was no row for us to lock we got problems with two servers doing the select at the same time, finding nothing, and then trying to insert. Of course one of them gave us a duplicate key exception.
So instead we looked to the merge statement. We made a merge statement that looked like this (simplified for clarity):
MERGE INTO articles sr
USING (
VALUES (:PARAM_ARTICLE_NUMBER))
AS v(ARTICLE_NUMBER)
ON sr.ARTICLE_NUMBER = v.ARTICLE_NUMBER
WHEN MATCHED THEN
UPDATE SET
QUANTITY = QUANTITY + :PARAM_QUANTITY
ARRIVED_DATE = CASE WHEN ARRIVED_DATE IS NULL
THEN :PARAM_ARRIVED_DATE
ELSE ARRIVED_DATE END
WHEN NOT MATCHED THEN
INSERT (QUANTITY, ARRIVED_DATE)
VALUES (:PARAM_QUANTITY, CURRENT_TIMESTAMP);
However, for some reason we are still getting duplicate key problems. My believe is that even if the merge statement is atomic two merge statements can run concurrently and select at the same time.
Is there any way, short of locking the whole table, to make sure we only get one insert?

In a similar situation running the MERGE with the Repeatable Read isolation level solved our problem. RS was insufficient, because it still allowed phantom rows, which is exactly the issue you are experiencing. You can simply add WITH RR at the end of the statement and try it out.
Our test suite runs with up to 1000 simultaneous connections and we don't see concurrency much affected by the RR isolation used for that particular statement only.

Do the insert first, catch the duplicate key exception if thrown; then update instead.
Charles

Related

Does JDBC getGeneratedKeys() method always same order of inserted element

I use executeBatch() with JDBC to insert multiple rows and I want to get id of inserted rows for another insert I use this code for that purpose:
insertInternalStatement = dbConncetion.prepareStatement(INSERT_RECORD, generatedColumns);
for (Foo foo: foosHashSet) {
insertInternalStatement.setInt(1, foo.getMe());
insertInternalStatement.setInt(1, foo.getMe2());
// ..
insertInternalStatement.addBatch();
}
insertInternalStatement.executeBatch();
// now get inserted ids
try (ResultSet generatedKeys = insertInternalStatement.getGeneratedKeys()) {
Iterator<Foo> fooIterator= foosHashSet.iterator();
while (generatedKeys.next() && fooIterator.hasNext()) {
fooIterator.next().setId(generatedKeys.getLong(1));
}
}
It works fine and ids are returned, my question are:
if I iterate over getGeneratedKeys() and foosHashSet will ids return in same order so that each returned id from database belongs to corresponding Foo instance?
What about when I use multi thread and above code run in multiple threads simultaneously?
Is there any other solution for this? I have two table foo1 and foo2 and I want first insert foo1 records then use their primary ids as foo2 foreign key.
Given support for getGeneratedKeys for batch execution is not defined in the JDBC specification, the behavior will depend on the driver used. I would expect any driver that supports generated keys for batch execution, to return the ids in order they where added to the batch.
However the fact you are using a Set is problematic. Iteration order for most sets are not defined, and could change between iterations (usually only after modification, but in theory you can't assume anything about the order). You need to use something with a guaranteed order, eg a List or maybe a LinkedHashSet.
Applying multi-threading here would probably be a bad idea: you should only use a JDBC connection from a single-thread at a time. Accounting for multi-threading would either require correct locking, or requiring you to split up the workload so it can use separate connections. Whether that would improve or worsen performance is hard to say.
You should be able to iterate through multiple generated keys without problem. They will return in the correct order they were inserted.
I think there should not be any problem adding threads in this matter. The only thing I'm pretty sure is that you would not be able to control the order the ids are inserted on both tables without some code complication.
You could store all firstly inserted ids on a Collection and after all threads/iterations have finished, insert them on second table.
The iteration is the same as long as the fooHashSet is not altered.
One could think using a LinkedHashSet which yields the items in order of insertion. Especially when nothing is removed or overwritten that would be nice.
Concurrent access would be problematic.
Use LinkedHashSet without removal, only adding new items. And additionally wrap it in Collections.synchronizedMap. For set alterations one
would need a Semaphore or such, as synchronizing such a large code block is a no-go.
An - even better performing - solution might be to make a local copy:
List<Me> list = fooHashSet.stream().map(Foo::Me)
.collect(Collectors.toList());
However this still is a somewhat unsatisfying solution:
a batch for multiple inserts and then per insert several other updates/inserts.
Transition to JPA instead of JDBC would somewhat alleviate the situation.
After some experience however I would pose the question whether a database at that point is still the correct tool (hammer)? If it is a graph, a hierarchical data structure, then storing the entire data structure as XML with JAXB in a single database table, could be the best solution. Faster. Easier development. Verifiable data.
Using the database for main data, and the XML for an edited/processed document.
Yes as per the definition of executing batch it says
createFcCouponStatement.executeBatch()
Submits a batch of commands to the database for execution and if all commands execute successfully, returns an array of update counts. The int elements of the array that is returned are ordered to correspond to the commands in the batch, which are ordered according to the order in which they were added to the batch. The elements in the array returned by the method executeBatch may be one of the following:
A number greater than or equal to zero -- indicates that the command was processed successfully and is an update count giving the number of rows in the database that was affected by the command's execution
A value of SUCCESS_NO_INFO -- indicates that the command was processed successfully but that the number of rows affected is unknown
If one of the commands in a batch update fails to execute properly, this method throws a BatchUpdateException, and a JDBC driver may or may not continue to process the remaining commands in the batch. However, the driver's behavior must be consistent with a particular DBMS, either always continuing to process commands or never continuing to process commands. If the driver continues processing after a failure, the array returned by the method BatchUpdateException.getUpdateCounts will contain as many elements as there are commands in the batch, and at least one of the elements will be the following:
A value of EXECUTE_FAILED -- indicates that the command failed to execute successfully and occurs only if a driver continues to process commands after a command fails
The possible implementations and return values have been modified in the Java 2 SDK, Standard Edition, version 1.3 to accommodate the option of continuing to process commands in a batch update after a BatchUpdateException object has been thrown.

How to avoid duplicate insert in DB through Java?

I need to insert employee in Employee table What i want is is to avoid duplicate inserts i.e. if thwo thread tries to insert same employee at same time then last transaction
should fail. For example if first_name and hire_date is same for two employees(same employee coming from two threads) then fail the last transaction.
Approach 1:- First approach i can think of put the constraint at column level(like combined unique constraint on first_name and hire_date) or in the query check if
employee exist throw error(i believe it will be possible through PL/SQL)
Approach 2:- Can it be done at java level too like create a method which first check if employee exists then throw error. In that case i need to make the method scynchronized (or
synchronized block) but it will impact performance it will unnecassrily hold other transactions also. Is there a way i can make put the lock(Reentrant lock) or use the synchronized method based on name/hiredate so that only those specific thransaction are put on hold which has same name and hiredate
public void save(Employee emp){
//hibernate api to save
}
I believe Approach 1 should be preferred as its simple and easier to implement. Right ? Even yes, i would like to know if it can be handled efficiently at java level ?
What i want is is to avoid duplicate inserts
and
but it will impact performance it will unnecassrily hold other transactions also
So, you want highly concurrent inserts that guarantee no duplicates.
Whether you do this in Java or in the database, the only way to avoid duplicate inserts is to serialize (or, Java-speak, synchronize). That is, have one transaction wait for another.
The Oracle database will do this automatically for you if you create a PRIMARY KEY or UNIQUE constraint on your key values. Simultaneous inserts that are not duplicates will not interfere or wait for one another. However, if two sessions simultaneously attempt duplicate inserts, the second will wait until the first completes. If the first session completed via COMMIT, then the second transaction will fail with a duplicate key on index violation. If the first session completed via ROLLBACK, the second transaction will complete successfully.
You can do something similar in Java as well, but the problem is you need a locking mechanism that is accessible to all sessions. synchronize and similar alternatives work only if all sessions are running in the same JVM.
Also, in Java, a key to maximizing concurrency and minimizing waits would be to only wait for actual duplicates. You can achieve something close to that by hashing the incoming key values and then synchronzing only on that hash. That is, for example, put 65,536 objects into a list. Then when an insert wants to happen, hash the incoming key values to a number between 1 and 65536. Then get that object from list and synchronize on that. Of course, you can also synchronize on the actual key values, but a hash is usually as good and can be easier to work with, especially if the incoming key values are unwieldly or sensitive.
That all said, this should absolutely all be done in the database using a simple PRIMARY KEY constraint on your table and appropriate error handling.
One of the main reasons of using databases is that they give you consistency.
You are volunteering to put some of that responsibility back into your application. That very much sounds like the wrong approach. Instead, you should study exactly which capabilities your database offers; and try to make "as much use of them as possible".
In that sense you try to fix a problem on the wrong level.
Pseudo Code :
void save (Employee emp){
if(!isEmployeeExist(emp)){
//Hibernate api to save
}
}
boolean isEmployeeExist(Employee emp){
// build and run query for finding the employee
return true; //if employee exists else return false
}
Good question. I would strongly suggest using MERGE (INSERT and UPDATE in single DML) in this case. Let Oracle handle txn and locks. It's best in your case.
You should create Primary Key, Unique constraint (approach 1) regardless of any solution to preserve data integrity.
-- Sample statement
MERGE INTO employees e
USING (SELECT * FROM hr_records) h
ON (e.id = h.emp_id)
WHEN MATCHED THEN
UPDATE SET e.address = h.address
WHEN NOT MATCHED THEN
INSERT (id, address)
VALUES (h.emp_id, h.address);
since the row is not inserted yet, the isolation level such as READ_COMMITED/REPEATABLE_READ will not be applicable on them.
Best is to apply DB constraint(unique) , if that does not exist then in a multi node setup
you can't achive it thru java locks as well. As request can go to any node.
So, in that case we need to have distributed lock kind of functionality.
We can create a table lock where we can define for each table only one/or collection of insertion is possible at a node.
Ex:
Table_Name, Lock_Acquired
emp, 'N'
no any code can get READ_COMMITED on this row and try to update Lock_acuired to 'Y'
so , any other code in other thread or other node wont be able to proceed further and lock will be given only when the previous lock has been released.
This will make sure highly concurrent system which can avoid duplication, however this will suffer from scalibiliy issue. So decide accordingly what you want to achive.

In mysql is it advisible to rely on uniqueness constraints or to manually check if the row is already present?

I have a table:
userId | subject
with uniqueness constraint on both combined.
Now I am writing thousands of rows to this table every few minutes. The data stream is coming from a queue and it might repeat. I have to however make sure that there is only one unique combination of userId,subject in the table.
Currently I rely on mysql's uniqueness constraint which throws as exception.
Another approach is run a SELECT count(*) query to check if this row is already present and then skip it if need be.
Since I want to write on an average 4 rows per second what is advisable.
Programming language: Java
EDIT:
Just in case I am not clear the question here is whether relying ton MYSQL to throw an exception is better or running a select query before insert operation is better in terms of performance.
I thought a select query is less CPU/IO intensive than a INSERT query. If I run too many INSERTS wouldn't that create many locks ?
MySQL is ACID and employs transactional locking, so relying on its uniqueness constraints is very standard. Note that you can do this either via PRIMARY KEY or UNIQUE KEY (but favour the former if you can).
A unique constraint is unique for the complete committed dataset.
There a several databases which allows to set "transaction isolation level".
userId subject
A 1
B 2
-------------------------
A 2
A 3
The two rows above the line are committed. Every connection can read these lines. The two line under the line are currently been written within your transaction. Within this connection all four lines are visible.
If another thread / connection / transaction tries to store A-2 there will be an exception in one of the two transaction (the first one can commit the transaction, the second one can't).
Other isolation level may fail earlier. But it is not possible to violate against the Unique-key constraint.

JBDC - execute SELECT and INSERT atomically across concurrent threads

I have searched the web for simple examples to this but to no avail. I need to run a select and insert operation as an atomic unit in Java, using JDBC against an Oracle database.
Effectively I need to do the following:
Select code from users
Go through all codes until I find one that is not used (as users can be deleted there may be codes available in the middle of the range)
Insert new user with that available code
This is a simple operation normally, but as my application is multi-threaded I'm not sure how to go about this. As concurrent threads running at the same time could both try and insert using the same value for code.
There are a couple workarounds or hacks that I can think of to do the job but in general how can I lock the table to make this operation atomic? Most of what I've seen involves row locks but as I'm not updating I don't see how this applies.
This is a tough problem to do entirely in SQL. Any solution is going to have race condition problems. If I was going to do it entirely in SQL I'd use a deleted code table. When users then get deleted you'd use some service to add their code to the deleted table. If the deleted code table is empty threads would use a sequence number to get their new code. Getting a code from the deleted would need to be in a synchronized block because of the get and then set nature with multiple SQL operations. I don't think SQL transactions are going to help there. They may keep the data consistent but if two threads use the same code then one of the two commits is going to throw an exception.
I think a better, and faster, mechanism would be to have a separate thread manage these deleted codes. It could write it in a database but also keep a BlockingQueue of deleted codes for the other threads to consume. If there must be no holes and you are worried about crashing then it will need to validate the list of available holes by querying the user table at launch. It would not need to synchronize or do any SQL transactions because only it would be deleting from the deleted code table.
Hope this helps.
I would lean toward putting the logic in a stored procedure. Use "select for update" to lock, then commit to unlock.
You can add a filter to your insert statement and retry logic on the client end, I guess:
determine an available code (proposed code)
perform the insert with a filter determine the number of rows from the executeUpdate result (0 means a concurrent thread grabbed this code, try again)
The insert would look something along these lines where 3 is your new id, 'Joe' your new user, and proposedCode the one you think is available:
INSERT INTO users
SELECT 3, :proposedCode, 'Joe'
FROM dual
WHERE :proposedCode NOT IN (SELECT code FROM users)
How about:
insert into usertable (
id,
code,
name
) values (
user_id_sequence.nextval,
(
select min(newcode)
from usertable, (
select level newcode
from dual
connect by level <= (select max(code)+1 from usertable))
where not exists (select 1 from usertable where code = newcode)
),
'mynewusername'
)
EDIT:
changed to max(code) + 1, so if there is no gap available, there is a new code available.

Can I do an atomic MERGE in Oracle?

I have a couple instances of a J2EE app running in a single WebLogic cluster.
At some point, these apps do a MERGE to insert or update a record into the back-end Oracle database. The MERGE checks to see if a row with a specified primary key is there or not. If it's there, update. If not, insert.
Now suppose two app instances want to insert or update a row with primary key = 100. Suppose the row doesn't exist. During the "check" stage of merge, they both see that the rows not there, so both of them attempt to insert. Then I get a unique key constraint violation.
My question is this: Is there an atomic MERGE in Oracle? I'm looking for something that has a similar effect to INSERT ... FOR UPDATE in PL/SQL except that I can only execute SQL from my apps.
EDIT: I was unclear. I AM using the MERGE statement while this error still occurs. The thing is, only the "modifying" part is atomic, not the whole merge.
This is not a problem with MERGE as such. Rather the issue lies in your application. Consider this stored procedure:
create or replace procedure upsert_t23
( p_id in t23.id%type
, p_name in t23.name%type )
is
cursor c is
select null
from t23
where id = p_id;
dummy varchar2(1);
begin
open c;
fetch c into dummy;
if c%notfound then
insert into t23
values (p_id, p_name);
else
update t23
set name = p_name
where id = p_id;
end if;
end;
So, this is the PL/SQL equivalent of a MERGE on T23. What happens if two sessions call it simultaneously?
SSN1> exec upsert_t23(100, 'FOX IN SOCKS')
SSN2> exec upsert_t23(100, 'MR KNOX')
SSN1 gets there first, finds no matching record and inserts a record. SSN2 gets there second but before SSN1 commits, finds no record, inserts a record and hangs because SSN1 has a lock on the unique index node for 100. When SSN1 commits SSN2 will hurl a DUP_VAL_ON_INDEX violation.
The MERGE statement works in exactly the same way. Both sessions will check on (t23.id = 100), not find it and go down the INSERT branch. The first session will succeed and the second will hurl ORA-00001.
One way to handle this is to introduce pessimistic locking. At the start of the UPSERT_T23 procedure we lock the table:
...
lock table t23 in row shared mode nowait;
open c;
...
Now, SSN1 arrives, grabs the lock and proceeds as before. When SSN2 arrives it can't get the lock, so it fails immediately. Which is frustrating for the second user but at least they are not hanging, plus they know someone else is working on the same record.
There is no syntax for INSERT which is equivalent to SELECT ... FOR UPDATE, because there is nothing to select. And so there is no such syntax for MERGE either. What you need to do is include the LOCK TABLE statement in the program unit which issues the MERGE. Whether this is possible for you depends on the framework you're using.
The MERGE statement in the second session can not "see" the insert that the first session did until that session commits. If you reduce the size of the transactions the probability that this will occur will be reduced.
Or, can you sort or partition your data so that all records of a given primary key will be given to the same session. A simple function like "primary key mod N" should distribute evenly to N sessions.
btw, if two records have the same primary key, the second will overwrite the first. Sounds a little odd.
Yes, and it's called.... MERGE
EDIT: The only way to get this water tight is to insert, catch the dup_val_on_index exception and handle it appropriately (update, or insert other record perhaps). This can easily be done with PL/SQL, but you can't use that.
You're also looking for workarounds. Can you catch the dup_val_on_index in Java and issue an extra UPDATE again?
In pseudo-code:
try {
// MERGE
}
catch (dup_val_on_index) {
// UPDATE
}
I am surprised that MERGE would behave the way you describe, but I haven't used it sufficiently to say whether it should or not.
In any case, you might have the transactions that wish to execute the merge set their isolation level to SERIALIZABLE. I think that may solve your issue.

Categories