Java Multithreaded delete on same sets of table

Java Multithreaded delete on same sets of table - java

I had to cleanup database ( few tables with given condition , where columns for conditions are always same ) e.g.
delete from table1 where date < given_date1 and id = given_id
delete from table2 where date < given_date2 and id = given_id
Where given_id and givendate relation varies on both table by table and id by id.
The actual delete condition is not always where date < givendate , I just wrote for example, so say one id has got 300 days of data, and other of 500 days of data, the where condition is allowed to delete oldes 10 days of data where 10 is a variable, based on user input, so at one iteration all nodes are processed with deleting oldest 10 days of data and thus query changes for each id, but the fact is that it would be on same sets of table
earlier that script was written in as sql script and doing its operation but was taking time, Now I have implemented a multithreaded java application where the new code looks like
for(i=0; i < idcount ; i++)
{
//launch new thread and against that thread call
delete(date,currentid);
}
function delete(date,id)
{
delete from table1 where date < given_date and id = given_id
delete from table2 where date < given_date and id = given_id
}
after implementing this I found deadlock on sql table, which was solved by indexing the tables, but still its not fast as it is supposed to be, If I have 500 threads they are all launched one after other, and obviously running on same sets of table. and sql is not actually executing in parallel on each table ?
When I monitor my java.exe and sqlserver.exe, its not busy at all ? I hope it is supposed to be.
Could anyone tell me what could be best approach to implement multithreaded delete on same sets of table, so that I can bump up the thread and do deletion in parallel and consume available resources

If all the actions are delete on a given id the I would just do a delete on each table doing all the ids at once.
e.g.
delete from table1 where date < given_date and id in (given_id1, given_id2 ..... )
If there are lots of given_ids the first insert them into a temporary table then execute each delete by joining the table to have deletions with the temporary table
Also if trying to use multiple threads then the improvement is really only expected if you act on a table in a thread so there will not be contention in the database.

Ignoring the problem you created...
Why not use the IN statement?
delete from table1 where date < given_date and id IN (id1, id2, id3, ...)
Update based on clarification:
Based on the explanation in the comments, my guess is that you don't have good indexes and every delete statement is resulting in a table scan. Each table scan locks the table and thus the database can only process one statement at a time. Index the date and id columns along with any other column used in the where clause of your delete statement.

In my personal experience, I make a class to manage my queries and the communication with the database. I use a thread pool to manage my threads and simply have the threads make calls to my static database manager. The manager should have a synchronized method in it that acquires a lock() on to the database connection. The threads will then be able to access the database and their actions won't conflict with each other.

If you dont care about making all command in one transaction unit so put the delete in its own transaction (small one).

Related

Limit rows with certain values in database

I have an application written in java that is running on multiple jvms and writing to the same database.
I have an entity called: request that has these columns: id, userId, text
I want to limit the amount of requests that an user can post to 3
I cannot write a code like:
line1 open transaction
line2 check if requests for that userId are less than 3
line3 if yes insert that record
line4 else abort
line5 close transaction
because since the jvms are 2, it's possible that one thread of jvm1 is executing line2 while another thread of jvm2 is executing the same line.
So both threads in their respective transaction (tx1 and tx2, in jvm1 and jvm2) thinks that there are less than 3 records and will commit the transaction.
One solution could be write a flag in a different table and use that flag to locking that part of code, but I would prefer another solution.
Have you some idea?
Update:
Im using JPA, so Im searching a way where I still can use it to achieve this result.

You'll also have a users table, the PK of which is referenced by userId.
One solution is to take a lock on the corresponding row in the users table before updating rows in the request table. Concurrent transactions trying to operate on requests for the same user have to wait to obtain the same lock until the previous transaction has committed / rolled back. Like:
BEGIN;
SELECT FROM users
WHERE id = 123
FOR NO KEY UPDATE; -- strong enough
INSERT INTO requests (...)
SELECT ... -- your constants here
FROM request
WHERE userid = 123
HAVING count(*) < 3; -- ①
DELETE ...
UPDATE ...
COMMIT;
① While the SELECT list only has constants, the query works without GROUP BY. HAVING count(*) < 3 skips the INSERT if at least 3 rows are found - without raising an exception.
FOR NO KEY UPDATE is weaker than FOR UPDATE in that it still allows concurrent writes as long as key columns are not changed. Good enough for your task.
You just need to make certain that all relevant write access to request takes this approach.

SQL Logic of Re-Inserting item from one table to another table? e.g Active records becoming Inactive records

I have a table named booking. The primary key is booking_id. The table has a column named booking_num. booking_num is used to identify the booking.
Another table is booking_hist. The purpose is to transfer completed records of booking from booking table into booking_hist table after a certain period, let's say 2 days. The column to identify it will be completed_dt.
For example, completed_dt is '08-SEP-19'.
What I'm trying to achieve is that after immediately 2 days after this date, it will be moved into booking_hist table.
Should there be any null<->non-null conversions of the column?
What is the logic I need to achieve this? How can i get the date to
count 2 days?

You can schedule a SQL Agent job runs daily and call a stored procedure to go through the active bookings and check the completed_dt like:
-- add your insert here, e.g. INSERT INTO bookings_hist (...)
SELECT *
FROM booking b
LEFT JOIN booking_hist h
ON b.booking_id=h.booking_id
WHERE h.booking_id IS NULL
AND completed_dt IS NOT NULL
AND completed_dt<DATEADD(DAY,-2,GETDATE());

This sounds like the kind of thing that should happen in a scheduled job.
I would add a ready_for_archive column to booking-just a boolean flag.
I would have a query that marked all bookings that happen before a specified date/time, and pass in the date of 2 days ago from java. Something like
UPDATE booking
SET ready_for_archive = 1
WHERE completed_dt <= :MY_START_DATE
Then I would add all of these records to the historical table
INSERT INTO booking_hist
SELECT * FROM booking
WHERE ready_for_archive = 1;
Then remove them from the booking table:
DELETE FROM booking
WHERE ready_for_archive = 1;
Marking the records to be archived before doing that process means there's no risk of accidentally deleting records that were one second too young to be copied.
Passing in the date after calculating it in Java makes the sql query more generic and reusable.

Create a stored procedure to move from one table to other
how to call the stored procedure in oracle with the daily scheduled jobs?
In where condition add ,
trunc(sysdate) - to_date('completed_dt', 'yyyy-mm-dd') >2
Please refer below link for other options:
How can I get the number of days between 2 dates in Oracle 11g?

Hibernate concurrency creating a duplicate record on saveOrUpdate

I'm trying to implement a counter with Java, Spring, Hibernate and Oracle SQL. Each record represents a count, by a given timestamp. Let's say each record is uniquely identified by the minute, and each record holds a count column. The service should expect to receive a ton of concurrent requests and my update a counter column for possibly the same record.
In my table, if the record does not exist, just insert the record in and set its count to 1. Otherwise, find the record by timestamp and increase its existing counter column by 1.
In order to ensure that we're maintain data consistency and integrity, I'm using pessimistic locking. For example, if 20 counts come in at the same time, and not necessarily by the same user, it's possible that we may override the record from a stale read of that record before updating. With locking, I'm ensuring that if 20 counts come in, the net effect on the database should represent the 20 count.
So locking is fine, but the problem is that if the record never did exist in the first place, and we have two or more concurrent requests coming in trying to update the not-yet-existant record, I've observed that the a duplicate record gets inserted because we cannot lock on a record that doesn't exist yet. How can we ensure that no duplicates get created in the table? Should it be controlled via Oracle? Or can I manage this via my app and Hibernate?
Thank you.

One was to avoid this sort of problem altogether would be to just generate the count at the time you actually query the data. Oracle has an analytic function ROW_NUMBER() which can assign a row number to each record in the result set of a query. As a rough example, consider the following query:
SELECT
ts,
ROW_NUMBER() OVER (ORDER BY ts) rn
FROM yourTable
The count you want would be in the rn column, representing the number of records appearing since the first entry in the table. Of course, you could further restrict the query.
This approach is robust to removing records, as the count would always start with 1. One drawback is that row number functionality is not supported by Hibernate. You would have to run this either as a native query or a stored proc.

JBDC - execute SELECT and INSERT atomically across concurrent threads

I have searched the web for simple examples to this but to no avail. I need to run a select and insert operation as an atomic unit in Java, using JDBC against an Oracle database.
Effectively I need to do the following:
Select code from users
Go through all codes until I find one that is not used (as users can be deleted there may be codes available in the middle of the range)
Insert new user with that available code
This is a simple operation normally, but as my application is multi-threaded I'm not sure how to go about this. As concurrent threads running at the same time could both try and insert using the same value for code.
There are a couple workarounds or hacks that I can think of to do the job but in general how can I lock the table to make this operation atomic? Most of what I've seen involves row locks but as I'm not updating I don't see how this applies.

This is a tough problem to do entirely in SQL. Any solution is going to have race condition problems. If I was going to do it entirely in SQL I'd use a deleted code table. When users then get deleted you'd use some service to add their code to the deleted table. If the deleted code table is empty threads would use a sequence number to get their new code. Getting a code from the deleted would need to be in a synchronized block because of the get and then set nature with multiple SQL operations. I don't think SQL transactions are going to help there. They may keep the data consistent but if two threads use the same code then one of the two commits is going to throw an exception.
I think a better, and faster, mechanism would be to have a separate thread manage these deleted codes. It could write it in a database but also keep a BlockingQueue of deleted codes for the other threads to consume. If there must be no holes and you are worried about crashing then it will need to validate the list of available holes by querying the user table at launch. It would not need to synchronize or do any SQL transactions because only it would be deleting from the deleted code table.
Hope this helps.

I would lean toward putting the logic in a stored procedure. Use "select for update" to lock, then commit to unlock.
You can add a filter to your insert statement and retry logic on the client end, I guess:
determine an available code (proposed code)
perform the insert with a filter determine the number of rows from the executeUpdate result (0 means a concurrent thread grabbed this code, try again)
The insert would look something along these lines where 3 is your new id, 'Joe' your new user, and proposedCode the one you think is available:
INSERT INTO users
SELECT 3, :proposedCode, 'Joe'
FROM dual
WHERE :proposedCode NOT IN (SELECT code FROM users)

How about:
insert into usertable (
id,
code,
name
) values (
user_id_sequence.nextval,
(
select min(newcode)
from usertable, (
select level newcode
from dual
connect by level <= (select max(code)+1 from usertable))
where not exists (select 1 from usertable where code = newcode)
),
'mynewusername'
)
EDIT:
changed to max(code) + 1, so if there is no gap available, there is a new code available.

Can I do an atomic MERGE in Oracle?

I have a couple instances of a J2EE app running in a single WebLogic cluster.
At some point, these apps do a MERGE to insert or update a record into the back-end Oracle database. The MERGE checks to see if a row with a specified primary key is there or not. If it's there, update. If not, insert.
Now suppose two app instances want to insert or update a row with primary key = 100. Suppose the row doesn't exist. During the "check" stage of merge, they both see that the rows not there, so both of them attempt to insert. Then I get a unique key constraint violation.
My question is this: Is there an atomic MERGE in Oracle? I'm looking for something that has a similar effect to INSERT ... FOR UPDATE in PL/SQL except that I can only execute SQL from my apps.
EDIT: I was unclear. I AM using the MERGE statement while this error still occurs. The thing is, only the "modifying" part is atomic, not the whole merge.

This is not a problem with MERGE as such. Rather the issue lies in your application. Consider this stored procedure:
create or replace procedure upsert_t23
( p_id in t23.id%type
, p_name in t23.name%type )
is
cursor c is
select null
from t23
where id = p_id;
dummy varchar2(1);
begin
open c;
fetch c into dummy;
if c%notfound then
insert into t23
values (p_id, p_name);
else
update t23
set name = p_name
where id = p_id;
end if;
end;
So, this is the PL/SQL equivalent of a MERGE on T23. What happens if two sessions call it simultaneously?
SSN1> exec upsert_t23(100, 'FOX IN SOCKS')
SSN2> exec upsert_t23(100, 'MR KNOX')
SSN1 gets there first, finds no matching record and inserts a record. SSN2 gets there second but before SSN1 commits, finds no record, inserts a record and hangs because SSN1 has a lock on the unique index node for 100. When SSN1 commits SSN2 will hurl a DUP_VAL_ON_INDEX violation.
The MERGE statement works in exactly the same way. Both sessions will check on (t23.id = 100), not find it and go down the INSERT branch. The first session will succeed and the second will hurl ORA-00001.
One way to handle this is to introduce pessimistic locking. At the start of the UPSERT_T23 procedure we lock the table:
...
lock table t23 in row shared mode nowait;
open c;
...
Now, SSN1 arrives, grabs the lock and proceeds as before. When SSN2 arrives it can't get the lock, so it fails immediately. Which is frustrating for the second user but at least they are not hanging, plus they know someone else is working on the same record.
There is no syntax for INSERT which is equivalent to SELECT ... FOR UPDATE, because there is nothing to select. And so there is no such syntax for MERGE either. What you need to do is include the LOCK TABLE statement in the program unit which issues the MERGE. Whether this is possible for you depends on the framework you're using.

The MERGE statement in the second session can not "see" the insert that the first session did until that session commits. If you reduce the size of the transactions the probability that this will occur will be reduced.
Or, can you sort or partition your data so that all records of a given primary key will be given to the same session. A simple function like "primary key mod N" should distribute evenly to N sessions.
btw, if two records have the same primary key, the second will overwrite the first. Sounds a little odd.

Yes, and it's called.... MERGE
EDIT: The only way to get this water tight is to insert, catch the dup_val_on_index exception and handle it appropriately (update, or insert other record perhaps). This can easily be done with PL/SQL, but you can't use that.
You're also looking for workarounds. Can you catch the dup_val_on_index in Java and issue an extra UPDATE again?
In pseudo-code:
try {
// MERGE
}
catch (dup_val_on_index) {
// UPDATE
}

I am surprised that MERGE would behave the way you describe, but I haven't used it sufficiently to say whether it should or not.
In any case, you might have the transactions that wish to execute the merge set their isolation level to SERIALIZABLE. I think that may solve your issue.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Multithreaded delete on same sets of table - java

If you dont care about making all command in one transaction unit so put the delete in its own transaction (small one).

Related

Limit rows with certain values in database

SQL Logic of Re-Inserting item from one table to another table? e.g Active records becoming Inactive records

Hibernate concurrency creating a duplicate record on saveOrUpdate

JBDC - execute SELECT and INSERT atomically across concurrent threads

Can I do an atomic MERGE in Oracle?

Categories

Resources