How to SET LOCK MODE in java application - java

I am working on a Java web application that uses Weblogic to connect to an Informix database. In the application we have multiple threads creating records in a table.
It happens pretty often that it fails and the following error is thrown:
java.sql.SQLException: Could not do a physical-order read to fetch next row....
Caused by: java.sql.SQLException: ISAM error: record is locked.
I am assuming that both threads are trying to insert or update when the record is locked.
I did some research and found that there is an option to set the database that instead of throwing an error, it should wait for the lock to be released.
SET LOCK MODE TO WAIT;
SET LOCK MODE TO WAIT 17;
I don't think that there is an option in JDBC to use this setting. How do I go about using this setting in my java web app?

You can always just send that SQL straight up, using createStatement(), and then send that exact SQL.
The more 'normal' / modern approach to this problem is a combination of MVCC, the transaction level 'SERIALIZABLE', retry, and random backoff.
I have no idea if Informix is anywhere near that advanced, though. Modern DBs such as Postgres are (mysql does not count as modern for the purposes of MVCC/serializable/retry/backoff, and transactional safety).
Doing MVCC/Serializable/Retry/Backoff in raw JDBC is very complicated; use a library such as JDBI or JOOQ.
MVCC: A mechanism whereby transactions are shallow clones of the underlying data. 2 separate transactions can both read and write to the same records in the same table without getting in each other's way. Things aren't 'saved' until you commit the transaction.
SERIALIZABLE: A transaction level (also called isolationlevel), settable with jdbcDbObj.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE); - the safest level. If you know how version control systems work: You're asking the database to aggressively rebase everything so that the entire chain of commits is ordered into a single long line of events: Each transaction acts as if it was done after the previous transaction was completed. The simplest way to implement this level is to globally lock all the things. This is, of course, very detrimental to multithread performance. In practice, good DB engines (such as postgres) are smarter than that: Multiple threads can simultaneously run transactions without just being frozen and waiting for locks; the DB engine instead checks if the things that the transaction did (not just writing, also reading) is conflict-free with simultaneous transactions. If yes, it's all allowed. If not, all but one simultaneous transaction throw a retry exception. This is the only level that lets you do this sequence of events safely:
Fetch the balance of isaace's bank account.
Fetch the balance of rzwitserloot's bank account.
subtract €10,- from isaace's number, failing if the balance is insufficient.
add €10,- to rzwitserloot's number.
Write isaace's new balance to the db.
Write rzwitserloot's new balance to the db.
commit the transaction.
Any level less than SERIALIZABLE will silently fail the job; if multiple threads do the above simultaneously, no SQLExceptions occur but the sum of the balance of isaace and rzwitserloot will change over time (money is lost or created – in between steps 1 & 2 vs. step 5/6/7, another thread sets new balances, but these new balances are lost due to the update in 5/6/7). With serializable, that cannot happen.
RETRY: The way smart DBs solve the problem is by failing (with a 'retry' error) all but one transaction, by checking if all SELECTs done by the entire transaction are not affected by any transactions that been committed to the db after this transaction was opened. If the answer is yes (some selects would have gone differently), the transaction fails. The point of this error is to tell the code that ran the transaction to just.. start from the top and do it again. Most likely this time there won't be a conflict and it will work. The assumption is that conflicts CAN occur but usually do not occur, so it is better to assume 'fair weather' (no locks, just do your stuff), check afterwards, and try again in the exotic scenario that it conflicted, vs. trying to lock rows and tables. Note that for example ethernet works the same way (assume fair weather, recover errors afterwards).
BACKOFF: One problem with retry is that computers are too consistent: If 2 threads get in the way of each other, they can both fail, both try again, just to fail again, forever. The solution is that the threads twiddle their thumbs for a random amount of time, to guarantee that at some point, one of the two conflicting retriers 'wins'.
In other words, if you want to do it 'right' (see the bank account example), but also relatively 'fast' (not globally locking), get a DB that can do this, and use JDBI or JOOQ; otherwise, you'd have to write code to run all DB stuff in a lambda block, catch the SQLException, check the SqlState to see if it is indicating that you should retry (sqlstate codes are DB-engine specific), and if yes, rerun that lambda, after waiting an exponentially increasing amount of time that also includes a random factor. That's fairly complicated, which is why I strongly advise you rely on JOOQ or JDBI to take care of this for you.
If you aren't ready for that level of DB usage, just make a statement and send "SET LOCK MDOE TO WAIT 17;" as SQL statement straight up, at the start of opening any connection. If you're using a connection pool there is usually a place you can configure SQL statements to be run on connection start.

The Informix JDBC driver does allow you to automatically set the lock wait mode when you connect to the server.
Simply pass via the DataSource or connection URL the following parameter
IFX_LOCK_MODE_WAIT=17
The values for JDBC are
(-1) Wait forever
(0) not wait (default)
(> 0) wait this many seconds
See https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.jdbc.doc/ids_jdbc_040.htm

Connection conn = DriverManager.getConnection ( "jdbc:Informix-sqli://cleo:1550:
IFXHOST=cleo;PORTNO=1550;user=rdtest;password=my_passwd;IFX_LOCK_MODE_WAIT=17";);

Related

How to resolve java.sql.SQLException distributed transaction waiting for lock

We are using Oracle 11G and JDK1.8 combination.
In our application we are using XAConnection, XAResource for DB transaction.
ie) distributed transactions.
On few occasions we need to kill our Java process to stop the application.
After killing, if we restart our application then we are getting the below exception while doing DB transaction.
java.sql.SQLException: ORA-02049: timeout: distributed transaction
waiting for lock
After this for few hours we are unable to use our application till the lock releases.
Can someone provide me some solution so that we can continue working instead of waiting for the lock to release.
I have tried the below option:
a) Fetched the SID and killed the session using alter command.After this also table lock is not released.
I am dealing with very small amount of data.
I followed one topic similar with that with tips about what to do with distributed connections.
Oracle connections remains open until you end your local session or until the number of database links for your session exceeds the value of OPEN_LINKS. To reduce the network overhead associated with keeping a database link open, then use this clause to close the link explicitly if you do not plan to use it again in your session.
I believe that, by closing your connections and sessions after DDL execution, this issue should not happens.
Other possibility is given on this question:
One possible way might be to increase the INIT.ORA parameter for distributed_lock_timeout to a larger value. This would then give you a longer time to observe the v$lock table as the locks would last for longer.
To achieve automation of this, you can either
- Run an SQL job every 5-10 seconds that logs the values of v$lock or the query that sandos has given above into a table and then analyze it to see which session was causing the lock.
- Run a STATSPACK or an AWR Report. The sessions that got locked should show up with high elapsed time and hence can be identified.
v$session has 3 more columns blocking_instance, blocking_session, blocking_session_statusthat can be added to the query above to give a picture of what is getting locked.
I hope I helped you, my friend.

MySQL - Single connection versus a Pool

Due to some previous questions that I've had answered about the synchronous nature of MySQL I'm starting to question the reason people use Connection pools, and if in my scenario I should move to a pool.
Currently my application keeps a single connection active. There's only a single connection, statement, and result set being used in my application that's recycled. All of my database tasks are placed in a queue and executed back to back on a seperate thread. One thread for database queries, One connection for database access. In the event that the connection has an issue, it will dispose of the connection and create a new one.
From my understanding regardless of how many queries are sent to MySQL to be processed they will all be processed synchornously in the order they are received. It does not matter if these queries are coming from a single source or multiple, they will be executed in the order received.
With this being said, what's the point in having multiple connections and threads to smash queries into the database's processing queue, when regardless it's going to process them one by one anyway. A query is not going to execute until the query before it has completed processing, and like-wise in my scenario where I'm not using a pool, the next query is not going to be executed until the previous query has completed processing.
Now you may say:
The amount of time spent on processing the results provided by the MySQL query will increase the amount of time between queries being executed.
That's obviously correct, which is why I have a worker thread that handles the results of a query. When a query is completed, I convert the results into Map<> format and release the statement/resultset from memory and start processing the next query. The Map<> is sent off to a separate Worker thread for processing, so it doesn't congest the query execution thread.
Can anyone tell me if the way I'm doing things is alright, and if I should take the time to move to a pool of connections rather than a persistent connection. The most important thing is why. I'm starting this thread strictly for informational purposes.
EDIT: 4/29/2016
I would like to add that I know what a connection pool is, however I'm more curious about the benefits of using a pool over a single persistent connection when the table locks out requests from all connections during query processing to begin with.
Just trying this StackOverflow thing out but,
In every connection to a database, most of the time, it's idle. When you execute a query in the connection to INSERT or UPDATE a table, it locks the table, preventing concurrent edits. While this is good and all, preventing data overwriting or corruption, this means that no other connections may make edits while the first connection/query is still running.
However, starting a new connection takes time, and in larger infrastructures trying to skim and skin off all excess time wastage, this is not good. As such, connection pools are a whole group of connections left in the idle state, ready for the next query.
Lastly, if you are running a small project, there's usually no reason for a connection pool but if you are running a large site with UPDATEs and INSERTs flying around every millisecond, a connection pool reduces overhead time.
Slightly related answer:
a pool can do additional "connection health checks" (by examining SQL exception codes)
and refresh connections to reduce memory usage (see the note on "maxLifeTime" in the answer).
But all those things might not outweigh the simpler approach using one connection.
Another factor to consider is (blocking) network I/O times. Consider this (rough) scenario:
client prepares query --> client sends data over the network
--> server receives data from the network --> server executes query, prepares results
--> server sends data over the network --> client receives data from the network
--> client prepares resultset
If the database is local (on the same machine as the client) then network times are barely noticeable.
But if the database is remote, network I/O times can become measurable and impact performance.
Assuming the isolation level is at "read committed", running select-statements in parallel could become faster.
In my experience, using 4 connections at the same time instead of 1 generally improves performance (or throughput).
This does depend on your specific situation: if MySQL is indeed just mostly waiting on locks to get released,
adding additional connections will not do much in terms of speed.
And likewise, if the client is single-threaded, the client may not actually perceive any noticeable speed improvements.
This should be easy enough to test though: compare execution times for one program with 1 thread using 1 connection to execute an X-amount of select-queries
(i.e. re-use your current program) with another program using 4 threads with each thread using 1 separate connection
to execute the same X-amount of select-queries divided by the 4 threads (or just run the first program 4 times in parallel).
One note on the connection pool (like HikariCP): the pool must ensure no transaction
remains open when a connection is returned to the pool and this could mean a "rollback" is send each time a connection is returned to the pool
(closed) when auto-commit is off and no "commit" or "rollback" was send previously.
This in turn can increase network I/O times instead of reducing it. So make sure to test with either
auto-commit on or make sure to always send a commit or rollback after your query or set of queries is done.
Connection pool and persistent connection are not the same thing.
One is the limit of the number of SQL connections, the other is single Pipe issues.
The problem is generally the time taken to transfer the SQL output to the server than the query execution time. So if you open two cli SQL clients and fire two queries, one with a large output and one with a small output (in that sequence), the smaller one finishes first while the larger one is still scrolling its output.
The point here is that multiple connection does solve problems for cases like the above.
When you have multiple front end requests asking for queries, you may prefer persistent connections because it gives you the benefit of multiplex over different connections (large versus small outputs) and prevents the overhead of session setup/teardown.
Connection pool APIs have inbuilt error checks and handling but most APIs still expect you to manually declare if you want a persistent connection or not.
So in effect there are 3 variables, pool, persistence and config parameters via the API. One has to make a mix and match of pool size, persistence and number of connections to suite one's environment

Limit concurrent execution of an application process using database

I was asked to implement an "access policy" to limit the amount of concurrent executions of a certaing process within an application (NOT a web application) which has direct connection to the database.
The application is running in several machines, and if more than a user tries to call the process, only one execution should be allowed at a given time, and the other must return an error message (NOT wait for the first execution to end).
Although I'm using Java/Postgres, is sort of a general question.
Given that I have the same application running in several machines, the simplest solution I can think of is implementing some sort of "database flag".
Something like checking whether the process is currently active:
SELECT Active FROM Process
If it's active, return a 'concurrent access policy error'. If not, activate it:
UPDATE Process SET Active = 'Y'
Once the execution is finished, simply update the active flag:
UPDATE Process SET Active = 'N'
However, I've encountered a major issue:
If I don't use a DB transaction in order to change the active flag, and the application is killed, the the active flag will remain with the Y value forever.
If I use a DB transaction, the first point is solved. However, the change of the active flag in a host (from N to Y) will only be visible after the commit, so the other hosts will never read active with Y value and therefore execute anyway.
Any ideas?
Don't bother with an active flag, instead simply lock a row based on the user ID. Keep that row locked in a dedicated transaction/connection. When the other user tries to lock the row (using SELECT ... FOR UPDATE) you'll get an error, and you can report it.
If the process holding the transaction fails, the lock is freed. If it quits, the lock is freed. If the DB is rebooted, the lock is freed.
Win all around.
Instead of having only a simple Y/N flag, put the timestamp at which active as been set, and have your client application set it regularly (say every minute, or every five minute). Then if a client crashes, other clients will have to wait just over that time limit, and then assume that client is dead and take over. This is just some kind of "heartbeat" mechanism to check the client that started the process is still alive.
A simpler solution would be to configure the database to only accept one connection at the time?
I am not sure if a RDBMS is the best system to solve this kind of issue. But I recently implemented a similar thing in SQL Server 2012. So here's what I learned from that experience.
In general, I mean in principle, you need an atomic operation "check the value, update the value (of one single record)" i.e. an atomic SELECT/UPDATE. This makes the matter complex. And because normally there's no such standard single atomic operation in the RDBMSs, you can get familiar with and use ISOLATION LEVEL SERIALIZABLE.
This is how I implemented it in SQL Server 2012, and I've seriously tested it, it's working fine. I have a table called DistributedLock, each record from it represents a logical lock. The operations I allow are tryLock and releaseLock (these are implemented as two stored procedures). The tryLock is non-blocking (practically non-blocking). If it succeeds, it returns some ID/stamp to the caller who can use that ID/stamp later to call releaseLock. If one calls releaseLock without actually holding the lock (without having the latest ID/stamp that is), the call succeeds and does nothing, otherwise (if the caller has the lock) the call succeeds and releases the lock held by the caller. I also have support for timeouts. So if some process grabs the ID/stamp of a given lock/record, and forgets to release it, it will expire automatically after some time.
Here is how the table looks like.
[DistributedLockID] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL -- surrogate PK
[ResourceID] [nvarchar](256) NOT NULL -- resource/lock logical identifier
[Duration] [int] NOT NULL
[AcquisitionTime] [datetime] NULL
[RecordStamp] [bigint] NOT NULL
I guess you can figure out the rest (or try, and then ping me if you get stuck).

Improving performance for WRITE operation on Oracle DB in Java

I've a typical scenario & need to understand best possible way to handle this, so here it goes -
I'm developing a solution that will retrieve data from a remote SOAP based web service & will then push this data to an Oracle database on network.
Also, this will be a scheduled task that will execute every 15 minutes.
I've event queues on remote service that contains the INSERT/UPDATE/DELETE operations that have been done since last retrieval, & once I retrieve the events for last 15 minutes, it again add events for next retrieval.
Now, its just pushing data to Oracle so all my interactions are INSERT & UPDATE statements.
There are around 60 tables on Oracle with some of them having 100+ columns. Moreover, for every 15 minutes cycle there would be around 60-70 Inserts, 100+ Updates & 10-20 Deletes.
This will be an executable jar file that will terminate after operation & will again start on next 15 minutes cycle.
So, I need to understand how should I handle WRITE operations (best practices) to improve performance for this application as whole ?
Current Test Code (on every cycle) -
Connects to remote service to get events.
Creates a connection with DB (single connection object).
Identifies the type of operation (INSERT/UPDATE/DELETE) & table on which it is done.
After above, calls the respective method based on type of operation & table.
Uses Preparedstatement with positional parameters, & retrieves each column value from remote service & assigns that to statement parameters.
Commits the statement & returns to get event class to process next event.
Above is repeated till all the retrieved events are processed after which program closes & then starts on next cycle & everything repeats again.
Thanks for help !
If you are inserting or updating one row at a time,You can consider executing a batch Insert or a batch Update. It has been proven that if you are attempting to update or insert rows after a certain quantity, you get much better performance.
The number of DB operations you are talking about (200 every 15 minutes) is tiny and will be easy to finish in less than 15 minutes. Some concrete suggestions:
You should profile your application to understand where it is spending its time. If you don't do this, then you don't know what to optimize next and you don't know if something you did helped or hurt.
If possible, try to get all of the events in one round-trip to the remote server.
You should reuse the connection to the remote service (probably by using a library that supports connection persistence and reuse).
You should reuse the DB connections by using a connection pooling library rather than creating a new connection for each insert/update/delete. Believe it or not, creating the connection probably takes 100+ times as long as doing your DB operation once you have the connection in hand.
You should consider doing multiple (or all) of the database operations in the same transaction rather than creating a new transaction for each row that is changed. However, you should carefully consider your failure modes such that you don't lose any events (if that is an important consideration).
You should consider utilizing prepared statement caching. This may help, but maybe not if Oracle is configured properly.
You should consider trying to analyze your operations to find any that can be batched together. This can be a lot faster if you have some "hot" operations that get done often.
"I've a typical scenario"
No you haven't. You have a bespoke architecture, with a unique data model, unique data and unique business requirements. That's not a bad thing, it's the state of pretty much every computer system that's not been bought off-the-shelf (and even some of them).
So, it's an experiment and you must approach it as such. There is no "best practice". Try various things and see what works best.
"need to understand best possible way to handle this"
You will improve your chances of success enormously by hiring somebody who understands Oracle databases.

setAutocommit(true) further explanations

I have come across this oracle java tutorial. As a beginner in the topic I cannot grasp why it's needed to set con.setAutocommit(true); at the end of the transaction.
Here is the oracle explanation:
The statement con.setAutoCommit(true); enables auto-commit mode, which
means that each statement is once again committed automatically when
it is completed. Then, you are back to the default state where you do
not have to call the method commit yourself. It is advisable to
disable the auto-commit mode only during the transaction mode. This
way, you avoid holding database locks for multiple statements, which
increases the likelihood of conflicts with other users.
Could you explain it in other words? especially this bit:
This way, you avoid holding database locks for multiple statements,
which increases the likelihood of conflicts with other users.
What do they mean with "holding database locks for multiple statements"?
Thanks in advance.
The database has to perform row-level or table-level locking (based on your database-engine in MySQL) to handle transactions. If you keep the auto-commit mode off and keep executing statements, these locks won't be released until you commit the transactions. Based on the type, other transactions won't be able to update the row/table that is currently locked. setAutocommit(true) basically commits the current transaction, releases the locks currently held, and enables auto-commit, That is, until further required, each individual statement is executed and commited.
row-level locks protect the individual rows that take part in the transaction (InnoDB). Table-level locks prevent concurrent access to the entire table (MyIsam).
When one transaction updates a row in the database others transaction cannot alter this row until the first one finishes (commits or rollbacks), therefore if you do not need transactions it is advisable to set con.setAutocommit(true).
With most modern database systems you can batch together a series of SQL statements. Typically the ones you care about are inserts as these will block out a portion of the space on disk that is being written to. In JDBC this is akin to Statement.addBatch(sql). Now where this becomes problematic is when you try to implement pessimistic or optimistic locks on tuples in the database. So if you have a series of long running transactions that execute multiple batches you can find yourself in a situation where all reads get rejected because of these exclusive locks. I believe in Oracle there is no such thing as the dirty read so this can potentially be mitigated. But imagine the scenario where you are running a job that attempts to delete a record while I am updating it, this is the type of conflict that they are referring to.
With auto-commit on, each part of the batch is saved before moving on to the next unit of work. This is what you see when trying to persist millions of records and it slows down considerably. Because the system is ensuring consistency with each insert statement. There is a quick way to get around this in Oracle (if you are using oracle) is to use the oracle.sql package and look at the ARRAY class.
Most databases will autoCommit by default. That means that as soon as you execute a statement the results will immediately appear in the database and everyone else using the database will immediately see them.
There are times, however, when you need to perform a number of changes on the database which must all be done at once and if one fails you want to back out of all of them.
Say you have a cars database and you come across a new car from a new manufacturer. Here you may wish to create the manufacturer entry in your database and the new car record and make sure they both appear at once for other users. Otherwise there may be a confusing moment in your database where one exists without the other.
To achieve this you switch autoCommit off, execute the statements, commit them and then set autoCommit back on. This last switch on of autoCommit is probably what you are seeing.

Categories