Commit changes with lock keeping - java

I have an application that works with database table like
Id, state, procdate, result
When there is a need to process some data, the app sets state to PROCESSING. After processing the result of processing is being set to result column and the state goes to STANDBY.
To do the first set to PROCESSING I start the transaction, do select for update, then update the state and procdate.
Then I do the work and using selection for update update the state and the result.
The processing may take up to 5 minutes. The state switching is needed to see how many rows are in progress. The problem is that another request for processing may occur and it has to wait until the first processing will end.
So I want to keep row locked. If I will make the select for update for locking just after I commit the processing state the second request may intercept and lock the row.
So how can I both keep the locking and commit the changes?

You'll need to handle this with your design. Here is an idea.
You records initially have a status, say 'READY', and a processing id, null initially.
When you start, update the status to 'PROCESSING', and update the id to a value for the job run, this can come from a sequence within Oracle, such that it is unique for your process run. commit.
the process runs, with the same id, and selects the status 'PROCESSING' and the same as it's defined processing id. Complete processing, update status to 'COMPLETE' (or 'STANDBY' as you have it). Commit.
This allows a second process to select new 'READY' records and set them for its own processing without interference with the already running process.

Here are two approaches I have taken. (I provide a third, but have never had to take that approach.)
1) Why not exit the transaction after committing the changes.
2) If option 1 is not viable, then you could simply:
COMMIT the changes
attempt to re-acquire the lock, if you fail, leave the screen, else just continue.
3) If it is absolutely imperative that no one can ever acquire the lock in the middle of a commit... you could actually lock another object. I will admit, I have never had to take this approach, but it would be as follows:
Initial phase
LOCK GLOBALOBJECT
Attempt to Acquire record lock for table
UNLOCK GLOBALOBJECT
Test to see if record lock was attained
Phase for committing the change
LOCK GLOBALOBJECT
COMMIT change
Acquire record lock for table
UNLOCK GLOBALOBJECT
Test to see if record lock was attained (Should never happen...)
I have never needed this kind of logic, and I really do not like it since it requires a GLOBAL locking object for this table. Again, it depends on your code, and the criticality of someone being able to commit changes while still being in the transaction.
However, just make sure you are not gold-plating your code when simply exiting out of the transaction after commiting a change would be fine for your stakeholders.

Related

use database as a queue of tasks

In one of our java applications (based on postgresql db), we have a database table that maintains a list of tasks to be executed.
Each row has a json blob for the details of a task as well as scheduled time value.
We have a few java workers/threads whose jobs are to search for tasks that are ready for execution (based on its schedule value), execute and delete them from the table. Execution of a task may take a few seconds.
The problem is, more than one worker may grab the same row, causing duplicate execution of a task, which is something we want to avoid.
One approach is, when doing select to grab a row, do it with FOR UPDATE to lock the row, supposedly preventing other worker from grabbing the same row that's locked.
My concern with this approach is, the row is only locked when the select transaction is being executed in the db (according to this), while the java code is actually executing the row/task that's selected, the locking has gone, another worker can grab it again.
Can some shed some light on whether the above approach is going to work for sure? Thanks!
Treat the DB calls as atomic instructions and design lock free algos around your table, using updates to change a boolean column "in-progress" from false to true. Could also just be a state int (0=avail, 1=inprogress, N=resultcode).
Make sure you have a partial index on state 0 (and possibly 1 to recover from crashes to find tasks in progress), so that the ...where state=0 remains selective and fast (on top of the scheduled time index of course).
Hope this helps.
When one thread has successfully locked the row on a given connection, another one attempting to obtain a lock on the row on a different connection should fail. You should issue the select-for-update with some kind of no-wait clause to request immediate failure if the row is locked.
Now, this doesn't solve the query vs lock race, as a failed lock may interrupt a thread's execution. You can solve that by (in each execution):
Select all records with new tasks (regardless of whether they're being processed or not)
For each new task returned in [1], run a matching select-for-update, then continue with processing the task if the lock fails.
If any lock attempt fails, skip the task without failing the entire process.

What happens during concurrent access with JPA and exclusive locks?

We are currently doing a discussion on our architecture.
The idea is that we have a database and multiple processing units. One transaction does include the following steps:
Query an entry based on a flag
Request exclusive lock for this entry
Do the processing
Update the Flag and some other columns
Release lock and commit transaction
But what happens if the second processing unit queries an entry, while the first one does hold a lock?
Updating the flag during transaction does not do the job due to transaction isolation and dirty read
In my opinion the possible results on this situation are:
The second processing unit gets the same entry and an exception is raised during the lockrequest
The second one get the next available entry and everything is fine.
The second processing unit gets the same entry.
However, whether the exception will be thrown or the lock acquisition will be blocked until the lock is released by the first transaction depends on the way you are asking for the lock in the second transaction (for example, with timeout or NO WAIT or something similar).
The second scenario (The second one gets the next available entry) is a bit harder to implement. Concurrent transactions are isolated from each other, so they basically see the same snapshot of data until one of them commits.
You can take a look at some database specific features, like Oracle Advanced Queue, or you could change the approach you read data (for example, read them in batches and then dispatch the processing to multiple threads/transactions). All of this highly depends on what exactly you are solving, are there any processing order constraints, failure/rollback/retry handling, etc.

Limit concurrent execution of an application process using database

I was asked to implement an "access policy" to limit the amount of concurrent executions of a certaing process within an application (NOT a web application) which has direct connection to the database.
The application is running in several machines, and if more than a user tries to call the process, only one execution should be allowed at a given time, and the other must return an error message (NOT wait for the first execution to end).
Although I'm using Java/Postgres, is sort of a general question.
Given that I have the same application running in several machines, the simplest solution I can think of is implementing some sort of "database flag".
Something like checking whether the process is currently active:
SELECT Active FROM Process
If it's active, return a 'concurrent access policy error'. If not, activate it:
UPDATE Process SET Active = 'Y'
Once the execution is finished, simply update the active flag:
UPDATE Process SET Active = 'N'
However, I've encountered a major issue:
If I don't use a DB transaction in order to change the active flag, and the application is killed, the the active flag will remain with the Y value forever.
If I use a DB transaction, the first point is solved. However, the change of the active flag in a host (from N to Y) will only be visible after the commit, so the other hosts will never read active with Y value and therefore execute anyway.
Any ideas?
Don't bother with an active flag, instead simply lock a row based on the user ID. Keep that row locked in a dedicated transaction/connection. When the other user tries to lock the row (using SELECT ... FOR UPDATE) you'll get an error, and you can report it.
If the process holding the transaction fails, the lock is freed. If it quits, the lock is freed. If the DB is rebooted, the lock is freed.
Win all around.
Instead of having only a simple Y/N flag, put the timestamp at which active as been set, and have your client application set it regularly (say every minute, or every five minute). Then if a client crashes, other clients will have to wait just over that time limit, and then assume that client is dead and take over. This is just some kind of "heartbeat" mechanism to check the client that started the process is still alive.
A simpler solution would be to configure the database to only accept one connection at the time?
I am not sure if a RDBMS is the best system to solve this kind of issue. But I recently implemented a similar thing in SQL Server 2012. So here's what I learned from that experience.
In general, I mean in principle, you need an atomic operation "check the value, update the value (of one single record)" i.e. an atomic SELECT/UPDATE. This makes the matter complex. And because normally there's no such standard single atomic operation in the RDBMSs, you can get familiar with and use ISOLATION LEVEL SERIALIZABLE.
This is how I implemented it in SQL Server 2012, and I've seriously tested it, it's working fine. I have a table called DistributedLock, each record from it represents a logical lock. The operations I allow are tryLock and releaseLock (these are implemented as two stored procedures). The tryLock is non-blocking (practically non-blocking). If it succeeds, it returns some ID/stamp to the caller who can use that ID/stamp later to call releaseLock. If one calls releaseLock without actually holding the lock (without having the latest ID/stamp that is), the call succeeds and does nothing, otherwise (if the caller has the lock) the call succeeds and releases the lock held by the caller. I also have support for timeouts. So if some process grabs the ID/stamp of a given lock/record, and forgets to release it, it will expire automatically after some time.
Here is how the table looks like.
[DistributedLockID] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL -- surrogate PK
[ResourceID] [nvarchar](256) NOT NULL -- resource/lock logical identifier
[Duration] [int] NOT NULL
[AcquisitionTime] [datetime] NULL
[RecordStamp] [bigint] NOT NULL
I guess you can figure out the rest (or try, and then ping me if you get stuck).

better way to process records, select and update lock

I had a problem in my software that sometimes caused a lock on the SQL server.
This was caused by a process that selects a group of records and starts processing them.
Based on some values and a calculation the records get updated.
When a record is being updated the page where that record is on, is locked by the SQL server for select. Which results in a lock that never solves itself.
To solve the problem we have created a second table, from which we select, the main table is copied into it before the process starts, the table that is updated is not being selected in that way and no lock can appear.
What I am looking for is simple and better solution for this problem, because for me it is like a workaround for something I'm doing the wrong way and would really like to improve the processing.
Try to change TRANSACTION ISOLATION LEVEL on database. Here is link.
I guess your default isolation level is set to repeatable read, which causes the select to set a shared lock on the returned records, deadlock happens when concurrent requests come in. To solve this you should take a locking select (to lock records with X lock rather than S lock).

Multiple thread selecting row from database optimisation

I have an java application where 15 threads select a row from table with 11,000 records, through a synchronized method called getNext(), the threads are getting slow at selection a row, thereby taking a huge amount of time. Each of the thread follows the following process:
Thread checks if a row with resume column value set to 1 exist.
A. If it exist the thread takes the id of that row and uses that id to select another row with id greater than that of the taking id.
B. Otherwise it select's a row with id greater than 0.
The last row received based on the outcome of steps described in 1 above is marked with the resume column set to 1.
The threads takes the row data and works on it.
Question:
How can multiple thread access thesame table selecting rows that another thread has not selected and be fast?
How can threads be made to resume in case of a crash at the last row that was selected by any of the threads?
1.:
It seems the multiple database operations in getNext() art the bottleneck. If the data isn't change by an outside source you could read "id" and "resume" of all rows and cache it. Than you would only have one query and than operate just in memory for reads. This would safe lot of expensive DB calls in getNext():
2.:
Basically you need some sort of transactions or at least add an other column that gets updated when a thread has finished processing that row. Basically the processing and the update need to happen in a single transaction. When something happens while the transaction is not finished, you can rollback to the state in which the row wasn't processed.
If the threads are all on the same machine they could use a shared data structure to avoid working on the same thing instead of synchronization. But the following assumes the threads are on on different machines ( maybe different members of an application server cluster ) and can only communicate via the database.
Remove synchronization on getNext() method. When setting the resume flag to 1 (step 2), do so atomically. update table set resume=1 where resume = 0, commit. Only one thread will succeed at this, the thread that does gets that unit of work. At the same time, set a resume time-- if the resume time is greater than some max assume the thread working on that unit of work hash crashed, set resume flag back to 0. After the work is finished set the resume time to null, or otherwise mark the work as done.
Well, would think of different issues here:
Are you keeping status in your DB? I would look for some approach where you call a select for update where you filter by inactive status (be sure just to get one row in the select) and immediately update to active (in same transaction). It would be nice to know what DB you're using, not sure if "select for update" is always an option.
Process and when you're finished, update to finished status.
Be sure to keep a timestamp in the table to identifiy when you changed status for the last time. Make yourself a rule to decide when an active thread will be treated as lost.
Define other possible error scenarios (what happens if the process fails).
You would also need to analyze the scenario. How many rows does your table have? How many threads call it concurrently? How many inserts occur in a given time? Depending on this you will have to see how DB performance is running.
I'm assuming you'r getNext() is synchronized, with what I wrote on point 1 you might get around this...

Categories