Limit concurrent execution of an application process using database

Limit concurrent execution of an application process using database - java

I was asked to implement an "access policy" to limit the amount of concurrent executions of a certaing process within an application (NOT a web application) which has direct connection to the database.
The application is running in several machines, and if more than a user tries to call the process, only one execution should be allowed at a given time, and the other must return an error message (NOT wait for the first execution to end).
Although I'm using Java/Postgres, is sort of a general question.
Given that I have the same application running in several machines, the simplest solution I can think of is implementing some sort of "database flag".
Something like checking whether the process is currently active:
SELECT Active FROM Process
If it's active, return a 'concurrent access policy error'. If not, activate it:
UPDATE Process SET Active = 'Y'
Once the execution is finished, simply update the active flag:
UPDATE Process SET Active = 'N'
However, I've encountered a major issue:
If I don't use a DB transaction in order to change the active flag, and the application is killed, the the active flag will remain with the Y value forever.
If I use a DB transaction, the first point is solved. However, the change of the active flag in a host (from N to Y) will only be visible after the commit, so the other hosts will never read active with Y value and therefore execute anyway.
Any ideas?

Don't bother with an active flag, instead simply lock a row based on the user ID. Keep that row locked in a dedicated transaction/connection. When the other user tries to lock the row (using SELECT ... FOR UPDATE) you'll get an error, and you can report it.
If the process holding the transaction fails, the lock is freed. If it quits, the lock is freed. If the DB is rebooted, the lock is freed.
Win all around.

Instead of having only a simple Y/N flag, put the timestamp at which active as been set, and have your client application set it regularly (say every minute, or every five minute). Then if a client crashes, other clients will have to wait just over that time limit, and then assume that client is dead and take over. This is just some kind of "heartbeat" mechanism to check the client that started the process is still alive.
A simpler solution would be to configure the database to only accept one connection at the time?

I am not sure if a RDBMS is the best system to solve this kind of issue. But I recently implemented a similar thing in SQL Server 2012. So here's what I learned from that experience.
In general, I mean in principle, you need an atomic operation "check the value, update the value (of one single record)" i.e. an atomic SELECT/UPDATE. This makes the matter complex. And because normally there's no such standard single atomic operation in the RDBMSs, you can get familiar with and use ISOLATION LEVEL SERIALIZABLE.
This is how I implemented it in SQL Server 2012, and I've seriously tested it, it's working fine. I have a table called DistributedLock, each record from it represents a logical lock. The operations I allow are tryLock and releaseLock (these are implemented as two stored procedures). The tryLock is non-blocking (practically non-blocking). If it succeeds, it returns some ID/stamp to the caller who can use that ID/stamp later to call releaseLock. If one calls releaseLock without actually holding the lock (without having the latest ID/stamp that is), the call succeeds and does nothing, otherwise (if the caller has the lock) the call succeeds and releases the lock held by the caller. I also have support for timeouts. So if some process grabs the ID/stamp of a given lock/record, and forgets to release it, it will expire automatically after some time.
Here is how the table looks like.
[DistributedLockID] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL -- surrogate PK
[ResourceID] [nvarchar](256) NOT NULL -- resource/lock logical identifier
[Duration] [int] NOT NULL
[AcquisitionTime] [datetime] NULL
[RecordStamp] [bigint] NOT NULL
I guess you can figure out the rest (or try, and then ping me if you get stuck).

Related

How to SET LOCK MODE in java application

I am working on a Java web application that uses Weblogic to connect to an Informix database. In the application we have multiple threads creating records in a table.
It happens pretty often that it fails and the following error is thrown:
java.sql.SQLException: Could not do a physical-order read to fetch next row....
Caused by: java.sql.SQLException: ISAM error: record is locked.
I am assuming that both threads are trying to insert or update when the record is locked.
I did some research and found that there is an option to set the database that instead of throwing an error, it should wait for the lock to be released.
SET LOCK MODE TO WAIT;
SET LOCK MODE TO WAIT 17;
I don't think that there is an option in JDBC to use this setting. How do I go about using this setting in my java web app?

You can always just send that SQL straight up, using createStatement(), and then send that exact SQL.
The more 'normal' / modern approach to this problem is a combination of MVCC, the transaction level 'SERIALIZABLE', retry, and random backoff.
I have no idea if Informix is anywhere near that advanced, though. Modern DBs such as Postgres are (mysql does not count as modern for the purposes of MVCC/serializable/retry/backoff, and transactional safety).
Doing MVCC/Serializable/Retry/Backoff in raw JDBC is very complicated; use a library such as JDBI or JOOQ.
MVCC: A mechanism whereby transactions are shallow clones of the underlying data. 2 separate transactions can both read and write to the same records in the same table without getting in each other's way. Things aren't 'saved' until you commit the transaction.
SERIALIZABLE: A transaction level (also called isolationlevel), settable with jdbcDbObj.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE); - the safest level. If you know how version control systems work: You're asking the database to aggressively rebase everything so that the entire chain of commits is ordered into a single long line of events: Each transaction acts as if it was done after the previous transaction was completed. The simplest way to implement this level is to globally lock all the things. This is, of course, very detrimental to multithread performance. In practice, good DB engines (such as postgres) are smarter than that: Multiple threads can simultaneously run transactions without just being frozen and waiting for locks; the DB engine instead checks if the things that the transaction did (not just writing, also reading) is conflict-free with simultaneous transactions. If yes, it's all allowed. If not, all but one simultaneous transaction throw a retry exception. This is the only level that lets you do this sequence of events safely:
Fetch the balance of isaace's bank account.
Fetch the balance of rzwitserloot's bank account.
subtract €10,- from isaace's number, failing if the balance is insufficient.
add €10,- to rzwitserloot's number.
Write isaace's new balance to the db.
Write rzwitserloot's new balance to the db.
commit the transaction.
Any level less than SERIALIZABLE will silently fail the job; if multiple threads do the above simultaneously, no SQLExceptions occur but the sum of the balance of isaace and rzwitserloot will change over time (money is lost or created – in between steps 1 & 2 vs. step 5/6/7, another thread sets new balances, but these new balances are lost due to the update in 5/6/7). With serializable, that cannot happen.
RETRY: The way smart DBs solve the problem is by failing (with a 'retry' error) all but one transaction, by checking if all SELECTs done by the entire transaction are not affected by any transactions that been committed to the db after this transaction was opened. If the answer is yes (some selects would have gone differently), the transaction fails. The point of this error is to tell the code that ran the transaction to just.. start from the top and do it again. Most likely this time there won't be a conflict and it will work. The assumption is that conflicts CAN occur but usually do not occur, so it is better to assume 'fair weather' (no locks, just do your stuff), check afterwards, and try again in the exotic scenario that it conflicted, vs. trying to lock rows and tables. Note that for example ethernet works the same way (assume fair weather, recover errors afterwards).
BACKOFF: One problem with retry is that computers are too consistent: If 2 threads get in the way of each other, they can both fail, both try again, just to fail again, forever. The solution is that the threads twiddle their thumbs for a random amount of time, to guarantee that at some point, one of the two conflicting retriers 'wins'.
In other words, if you want to do it 'right' (see the bank account example), but also relatively 'fast' (not globally locking), get a DB that can do this, and use JDBI or JOOQ; otherwise, you'd have to write code to run all DB stuff in a lambda block, catch the SQLException, check the SqlState to see if it is indicating that you should retry (sqlstate codes are DB-engine specific), and if yes, rerun that lambda, after waiting an exponentially increasing amount of time that also includes a random factor. That's fairly complicated, which is why I strongly advise you rely on JOOQ or JDBI to take care of this for you.
If you aren't ready for that level of DB usage, just make a statement and send "SET LOCK MDOE TO WAIT 17;" as SQL statement straight up, at the start of opening any connection. If you're using a connection pool there is usually a place you can configure SQL statements to be run on connection start.

The Informix JDBC driver does allow you to automatically set the lock wait mode when you connect to the server.
Simply pass via the DataSource or connection URL the following parameter
IFX_LOCK_MODE_WAIT=17
The values for JDBC are
(-1) Wait forever
(0) not wait (default)
(> 0) wait this many seconds
See https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.jdbc.doc/ids_jdbc_040.htm

Connection conn = DriverManager.getConnection ( "jdbc:Informix-sqli://cleo:1550:
IFXHOST=cleo;PORTNO=1550;user=rdtest;password=my_passwd;IFX_LOCK_MODE_WAIT=17";);

how does lost update differ from non repetable read?

Am trying to understand isolation levels and various issues ..... i.e. dirty read , non repeatable read , phantom read and lost update .
Was reading about Non repeatable read
Had also read about Lost update
what I am confused about is to me both of these look very similar i.e. in NRR ( Non repeatable read ) Tx B updated the row between two reads of the same row by Tx A so Tx A got different results.
In case of Lost update - Tx B overwrites changes committed by Tx A
So to me really it seems that both of these seem quite similar and related.
Is that correct ?
My understanding is if we use 'optimistic locking' it will prevent the issue of 'lost update'
(Based on some very good answers here )
My confusion :
However would it also imply / mean that by using 'optimistic locking' we also eliminate the issue of 'non repeatable read' ?
All of these questions pertain to a Java J2EE application with Oracle database.
NOTE : to avoid distractions I am not looking for details pertaining to dirty reads and phantom reads - my focus presently is entirely on non repeatable reads and lost update

Non-repeatable reads, lost updates, phantom reads, as well as dirty reads, are about transaction isolation levels, rather than pessimistic/optimistic locking. I believe Oracle's default isolation level is read committed, meaning that only dirty reads are prevented.
Non-repeatable reads and lost updates are indeed somehow related, as they may or may not occur on the same level of isolation. Neither can be avoided by locking only unless you set the correct isolation level, but you can use versioning (a column value that is checked against and increments on every update) to at least detect the issue (and take necessary action).

The purpose of repeatable reads is to provide read-consistent data:
within a query, all the results should reflect the state of the data at a
specific point in time.
within a transaction, the same query should return the same results
even if it is repeated.
In Oracle, queries are read-consistent as of the moment the query started. If data changes during the query, the query reads the version of the data that existed at the start of the query. That version is available in the "UNDO".
Bottom line: Oracle by default has an isolation level of READ COMMITTED, which guarantees read-consistent data within a query, but not within a transaction.
You talk about Tx A and Tx B. In Oracle, a session that does not change any data does not have a transaction.
Assume the default isolation level of READ COMMITTED. Assume the J2EE application uses a connection pool and is stateless.
app thread A connects to session X and reads a row.
app thread B connects to session Y and updates the row with commit.
app thread A connects to session Z and reads the same row, seeing a different result.
Notice that there is nothing any database can do here. Even if all the sessions had the SERIALIZABLE isolation level, session Z has no idea what is going on in session X. Besides, thread A cannot leave a transaction hanging in session X when it disconnects.
To your question, notice that app thread A never changed any data. The human user behind app thread A queried the same data twice and saw two different results, that is all.
Now let's do an update:
app thread A connects to session X and reads a row.
app thread B connects to session Y and updates the row with commit.
app thread A connects to session Z and updates the same row with commit.
Here the same row had three different values, not two. The human user behind thread A saw the first value and changed it to the third value without ever seeing the second value! That is what we mean by a "lost update".
The idea behind optimistic locking is to notify the human user that, between the time they queried the data and the time they asked to update it, someone else changed the data first. They should look at the most recent values before confirming the update.
To simplify:
"non-repeatable reads" happen if you query, then I update, then you query.
"lost updates" happen if you query, then I update, then you update. Notice that if you query the data again, you need to see the new value in order to decide what to do next.
Suggested reading: https://blogs.oracle.com/oraclemagazine/on-transaction-isolation-levels
Best regards, Stew Ashton

Uniquness check without DB constraints

For example, we have a table (login, hash). We have no unique constraint on login column, but we should keep it unique (just for example).
When a new user registers, we check if entered login is free.
If it's a java web app deployed to Tomcat, that has thread pool, then those checks might be processed parallel, right? How to ensure uniqueness then?

You can use pessimist lock in the table, that will lock the table and you can check if has and save, so other thread won't be able to change that table for this time being. But I think that is a really bad way to do things, why not use DB constraints ?

In short, you can't have a good solution without database constraints here.
Without a constraint in a multi-threaded environment you'll need some common resource to synchronize your threads on. A thread would acquire the mutex, check if login is free (using a SELECT) and then INSERT a new record if it was free. No other thread should be able to do this at the same time - this is why you need synchronization here.
This will work iff all your threads have access to this mutex and if it is guaranteed that noone else can access the database at the same time.
The first problem appears if you have, for instance several machines which access the same database. Threads running on different machines will not have access to the same mutex so they will happily insert in your table in parallel.
The other problem that if someone logs in to database and creates records in that table directly, such inserts may happen exactly between SELECT and INSERT executed from your code. So synchronization in code won't help here.
A further option is locking the whole table, but that's even worse. You'll need to very reliably release the lock otherwise you're risking stalling the whole system.

How to check if a specific row with certain values has been inserted in the Database

I am using Informix DB. This question may not be tied to one specific database. But I want to know how I can in Java, continuously probe into a Database and check if a certain row has been added to a table in the DB. Basically, the flow is:
My Java application should use JDBC to check if a certain table is populated.
If no, it should wait until a row has been inserted.
My question how can I have Java be aware of a row insertion. I am not expecting to add any triggers or anything, but in pure Java be able to check that the row is added.
Some thoughts that come to my mind are continuously call DB for the row, or periodically (every half-hour or so) call DB and check if the row is available. But what I am looking for is something like a Listener which can do this.

There is no facility in the Informix DBMS to signal when a particular row arrives in a table.
Well, I say that, but there is the DB-Cron facility which can periodically execute tasks (inside the server), and you could conceivably schedule a task to poll for the data to see if it has arrived and to send a message (somehow) to indicate that it has. It would be non-trivial, especially the part the indicates that it has arrived.
The JDBC protocol (and SQL protocols generally) are essentially synchronous; the client sends a request and waits for an answer from the DBMS.
So, pragmatically, if your delay period is half an hour, you can either create an admin task to handle the processing (you could write a Java UDR to be executed in the server by the server if that's crucial to you), or you can arrange for the Java (client-side) program to poll periodically to find out whether the information you need is there. A half-hour delay is not going to stress anything, even with a moderate number of processes polling for separate values (or even the same value). On the other hand, you normally try to avoid polling when you can. You'll need to strike a balance between responsiveness to the special data arriving and general system responsiveness. On the whole, general system responsiveness is more important, so keep the polling interval as large as you can.
If your polling interval needed to be sub-second, then the balance would be different - the job would be a lot harder.

Restricting a SELECT statement in SQL Server 2005

I have a situation where before doing a particular task I have to check whether a particular flag is set in DB and if it is not set then rest of the processing is done and the same flag is set. Now, in case of concurrent access from 2 different transactions, if first transaction check the flag and being not set it proceeds further. At the same time, I want to restrict the 2nd transaction from checking the flag i.e. I want to restrict that transaction from executing a SELECT query and it can execute the same once the 1st transaction completes its processing and set the flag.
I wanted to implement it at the DB level with locks/hints. But no hint restrict SELECT queries and I cannot go for Isolation level restrictions.

You can create an Application Lock to protect your flag, so the second transaction will not perform SELECT or access the flag if it cannot acquire the Application Lock

I believe that SQL Server 2005 does this natively by not permitting a dirty read. That is, as I understand it, as long as the update / insert occurs before the second user tries to do the select to check the flag, the db will wait for the update / insert to be committed before processing the select.
Here are some common locks that may assist you as well, if you'd like more granularity.
edit : XLOCK may also be of some help. And, wrapping the SQL in a transaction may help as well.

You could try an stored procedure which does both tasks, or as an entry point for 2 distinct stored procedures which does different tasks (something like a proxy).
Stored procedures are monitors in SQL Server, so are artifacts to manage concurrency (what is you want to do).

You just need to simply start a transaction in your SP / code then update the flag. That will block any other user from reading it (unless they are reading uncommitted).
If they are reading uncommitted, set an exclusive lock on your update transaction.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.