I am working on an application with high number for DML operations due to which log file sync wait event is observed. We are using ebean framework for querying the Oracle database. I was looking for a way to reduce the number of commits. Is it advisable to use JDBC batch using batch size attribute for transactional calls.
Is it advisable to use JDBC batch using batch size attribute for
transactional calls.
Assuming a transaction is inserting, updating or deleting more that 1 bean/row then in short yes.
The caveat is that in terms of application code the actual execution of DML can occur later with statement flush at batch size, at commit time etc. This means statements can execute later in application code (like at commit time).
This typically only really matters to application code when application code is looking to handle exceptions like db constraint violations, missing foreign keys, unique constraints etc and actually continue the transaction. In this case we might need to add explicit transaction.flush() into the application code to ensure the statements have been executed and hit the database.
Related
I'm using NamedParameterJdbcTemplate. I need to inserts data to 5 different tables within a transaction.
The sequential execution of inserts take long time & I need to optimize the time taken for inserts.
One possible option is make all inserts parallel using threads. As far as I understood transaction is not propagate to multi threads.
How can I improve time taken for this operation within a transaction boundary ?
I don't think what you are trying to do can possibly work.
As far as I know a database transaction is always bound to a single connection.
And the JDBC connection API is blocking, i.e. you can only execute a single statement at a time. So even when you share the Spring transaction across multiple threads you'll still execute your SQL sequential.
I therefore see the following options which might be combined available to you:
Tune your database/SQL: batched inserts, disabled constraints, adding or removing indexes and so one might have a effect on the execution time.
Drop the transactional constraint.
If you can break your process into multiple processes you might be able to run them in parallel and actually gaining performance.
Tune/parallelise the part happening in your Java application so you can do other stuff while your SQL statements are running.
To decide which approach is most promising we'd need to know more about your actual scenario.
I am working on a Java web application that uses Weblogic to connect to an Informix database. In the application we have multiple threads creating records in a table.
It happens pretty often that it fails and the following error is thrown:
java.sql.SQLException: Could not do a physical-order read to fetch next row....
Caused by: java.sql.SQLException: ISAM error: record is locked.
I am assuming that both threads are trying to insert or update when the record is locked.
I did some research and found that there is an option to set the database that instead of throwing an error, it should wait for the lock to be released.
SET LOCK MODE TO WAIT;
SET LOCK MODE TO WAIT 17;
I don't think that there is an option in JDBC to use this setting. How do I go about using this setting in my java web app?
You can always just send that SQL straight up, using createStatement(), and then send that exact SQL.
The more 'normal' / modern approach to this problem is a combination of MVCC, the transaction level 'SERIALIZABLE', retry, and random backoff.
I have no idea if Informix is anywhere near that advanced, though. Modern DBs such as Postgres are (mysql does not count as modern for the purposes of MVCC/serializable/retry/backoff, and transactional safety).
Doing MVCC/Serializable/Retry/Backoff in raw JDBC is very complicated; use a library such as JDBI or JOOQ.
MVCC: A mechanism whereby transactions are shallow clones of the underlying data. 2 separate transactions can both read and write to the same records in the same table without getting in each other's way. Things aren't 'saved' until you commit the transaction.
SERIALIZABLE: A transaction level (also called isolationlevel), settable with jdbcDbObj.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE); - the safest level. If you know how version control systems work: You're asking the database to aggressively rebase everything so that the entire chain of commits is ordered into a single long line of events: Each transaction acts as if it was done after the previous transaction was completed. The simplest way to implement this level is to globally lock all the things. This is, of course, very detrimental to multithread performance. In practice, good DB engines (such as postgres) are smarter than that: Multiple threads can simultaneously run transactions without just being frozen and waiting for locks; the DB engine instead checks if the things that the transaction did (not just writing, also reading) is conflict-free with simultaneous transactions. If yes, it's all allowed. If not, all but one simultaneous transaction throw a retry exception. This is the only level that lets you do this sequence of events safely:
Fetch the balance of isaace's bank account.
Fetch the balance of rzwitserloot's bank account.
subtract €10,- from isaace's number, failing if the balance is insufficient.
add €10,- to rzwitserloot's number.
Write isaace's new balance to the db.
Write rzwitserloot's new balance to the db.
commit the transaction.
Any level less than SERIALIZABLE will silently fail the job; if multiple threads do the above simultaneously, no SQLExceptions occur but the sum of the balance of isaace and rzwitserloot will change over time (money is lost or created – in between steps 1 & 2 vs. step 5/6/7, another thread sets new balances, but these new balances are lost due to the update in 5/6/7). With serializable, that cannot happen.
RETRY: The way smart DBs solve the problem is by failing (with a 'retry' error) all but one transaction, by checking if all SELECTs done by the entire transaction are not affected by any transactions that been committed to the db after this transaction was opened. If the answer is yes (some selects would have gone differently), the transaction fails. The point of this error is to tell the code that ran the transaction to just.. start from the top and do it again. Most likely this time there won't be a conflict and it will work. The assumption is that conflicts CAN occur but usually do not occur, so it is better to assume 'fair weather' (no locks, just do your stuff), check afterwards, and try again in the exotic scenario that it conflicted, vs. trying to lock rows and tables. Note that for example ethernet works the same way (assume fair weather, recover errors afterwards).
BACKOFF: One problem with retry is that computers are too consistent: If 2 threads get in the way of each other, they can both fail, both try again, just to fail again, forever. The solution is that the threads twiddle their thumbs for a random amount of time, to guarantee that at some point, one of the two conflicting retriers 'wins'.
In other words, if you want to do it 'right' (see the bank account example), but also relatively 'fast' (not globally locking), get a DB that can do this, and use JDBI or JOOQ; otherwise, you'd have to write code to run all DB stuff in a lambda block, catch the SQLException, check the SqlState to see if it is indicating that you should retry (sqlstate codes are DB-engine specific), and if yes, rerun that lambda, after waiting an exponentially increasing amount of time that also includes a random factor. That's fairly complicated, which is why I strongly advise you rely on JOOQ or JDBI to take care of this for you.
If you aren't ready for that level of DB usage, just make a statement and send "SET LOCK MDOE TO WAIT 17;" as SQL statement straight up, at the start of opening any connection. If you're using a connection pool there is usually a place you can configure SQL statements to be run on connection start.
The Informix JDBC driver does allow you to automatically set the lock wait mode when you connect to the server.
Simply pass via the DataSource or connection URL the following parameter
IFX_LOCK_MODE_WAIT=17
The values for JDBC are
(-1) Wait forever
(0) not wait (default)
(> 0) wait this many seconds
See https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.jdbc.doc/ids_jdbc_040.htm
Connection conn = DriverManager.getConnection ( "jdbc:Informix-sqli://cleo:1550:
IFXHOST=cleo;PORTNO=1550;user=rdtest;password=my_passwd;IFX_LOCK_MODE_WAIT=17";);
I have come across this oracle java tutorial. As a beginner in the topic I cannot grasp why it's needed to set con.setAutocommit(true); at the end of the transaction.
Here is the oracle explanation:
The statement con.setAutoCommit(true); enables auto-commit mode, which
means that each statement is once again committed automatically when
it is completed. Then, you are back to the default state where you do
not have to call the method commit yourself. It is advisable to
disable the auto-commit mode only during the transaction mode. This
way, you avoid holding database locks for multiple statements, which
increases the likelihood of conflicts with other users.
Could you explain it in other words? especially this bit:
This way, you avoid holding database locks for multiple statements,
which increases the likelihood of conflicts with other users.
What do they mean with "holding database locks for multiple statements"?
Thanks in advance.
The database has to perform row-level or table-level locking (based on your database-engine in MySQL) to handle transactions. If you keep the auto-commit mode off and keep executing statements, these locks won't be released until you commit the transactions. Based on the type, other transactions won't be able to update the row/table that is currently locked. setAutocommit(true) basically commits the current transaction, releases the locks currently held, and enables auto-commit, That is, until further required, each individual statement is executed and commited.
row-level locks protect the individual rows that take part in the transaction (InnoDB). Table-level locks prevent concurrent access to the entire table (MyIsam).
When one transaction updates a row in the database others transaction cannot alter this row until the first one finishes (commits or rollbacks), therefore if you do not need transactions it is advisable to set con.setAutocommit(true).
With most modern database systems you can batch together a series of SQL statements. Typically the ones you care about are inserts as these will block out a portion of the space on disk that is being written to. In JDBC this is akin to Statement.addBatch(sql). Now where this becomes problematic is when you try to implement pessimistic or optimistic locks on tuples in the database. So if you have a series of long running transactions that execute multiple batches you can find yourself in a situation where all reads get rejected because of these exclusive locks. I believe in Oracle there is no such thing as the dirty read so this can potentially be mitigated. But imagine the scenario where you are running a job that attempts to delete a record while I am updating it, this is the type of conflict that they are referring to.
With auto-commit on, each part of the batch is saved before moving on to the next unit of work. This is what you see when trying to persist millions of records and it slows down considerably. Because the system is ensuring consistency with each insert statement. There is a quick way to get around this in Oracle (if you are using oracle) is to use the oracle.sql package and look at the ARRAY class.
Most databases will autoCommit by default. That means that as soon as you execute a statement the results will immediately appear in the database and everyone else using the database will immediately see them.
There are times, however, when you need to perform a number of changes on the database which must all be done at once and if one fails you want to back out of all of them.
Say you have a cars database and you come across a new car from a new manufacturer. Here you may wish to create the manufacturer entry in your database and the new car record and make sure they both appear at once for other users. Otherwise there may be a confusing moment in your database where one exists without the other.
To achieve this you switch autoCommit off, execute the statements, commit them and then set autoCommit back on. This last switch on of autoCommit is probably what you are seeing.
I have the below flow in a multi-threaded environment
start transaction
read n number of top rows (based on a column) from db
check some criteria
update those set of rows
commit/rollback the transaction
I am using optimistic locking to handle multi-threaded scenario, but in above situation DB is always returning the same set of rows so if a second thread runs at the same time it will always fail.
Is there a better way to handle this?
Could we force DB to return different set of rows for each transaction using some option?
The reason you are getting the same top n records for all your threads is because of the I in the ACID (atomicity, consistency, isolation, durability) principles of transactions. Isolation means other operations cannot access data that has been modified during a transaction that has not yet completed. So until your threads commit their transactions the other threads cannot see what they have done.
It is possible to change the Isolation level on most databases to one of the following:
SERIALIZABLE
REPEATABLE READ
READ COMMITTED
READ UNCOMMITTED
In your case you probably want READ UNCOMMITTED as it allows one transaction to see uncommitted changes made by some other transaction.
Note: This is almost certainly the wrong isolation level for most applications, and could lead to data corruption. If other application other than the one you described here are accessing the same database you probably don't want to change the isolation level as those application may start to see unexpected and incorrect behaviour.
I have a situation where before doing a particular task I have to check whether a particular flag is set in DB and if it is not set then rest of the processing is done and the same flag is set. Now, in case of concurrent access from 2 different transactions, if first transaction check the flag and being not set it proceeds further. At the same time, I want to restrict the 2nd transaction from checking the flag i.e. I want to restrict that transaction from executing a SELECT query and it can execute the same once the 1st transaction completes its processing and set the flag.
I wanted to implement it at the DB level with locks/hints. But no hint restrict SELECT queries and I cannot go for Isolation level restrictions.
You can create an Application Lock to protect your flag, so the second transaction will not perform SELECT or access the flag if it cannot acquire the Application Lock
I believe that SQL Server 2005 does this natively by not permitting a dirty read. That is, as I understand it, as long as the update / insert occurs before the second user tries to do the select to check the flag, the db will wait for the update / insert to be committed before processing the select.
Here are some common locks that may assist you as well, if you'd like more granularity.
edit : XLOCK may also be of some help. And, wrapping the SQL in a transaction may help as well.
You could try an stored procedure which does both tasks, or as an entry point for 2 distinct stored procedures which does different tasks (something like a proxy).
Stored procedures are monitors in SQL Server, so are artifacts to manage concurrency (what is you want to do).
You just need to simply start a transaction in your SP / code then update the flag. That will block any other user from reading it (unless they are reading uncommitted).
If they are reading uncommitted, set an exclusive lock on your update transaction.