Most of us might be using Spring and Hibernate for data access.
I am trying to understand few of the internals of Spring Transaction Manager.
According to Spring API, it supports different Isolation Level - doc
But I couldn't find clear cut information on which occasions these are really helpful to gain performance improvements.
I am aware that readOnly parameter from Spring Transaction can help us to use different TxManagers to read-only data and can leverage good performance. But it locks the table to get the data to avoid dirty-reads/non-committed reads - doc.
Assume, in few occasions, we might want to blindly insert the records into a table and retrieve the information without locking the table, a case where we never update the table data, we just insert and read [append-only]. Can we use better Isolation to gain any performance?
As you see from one of the reference links, do we really require to implement/write our own CustomJPADiaelect?
What's the better Isolation for my requirement?
Read-only allows certain optimizations like disabling dirty checking and you should totally use it when you don't plan on changing an entity.
Each isolation level defines how much locking a database has to impose for ensuring the data anomaly prevention.
Most database use MVCC (Oracle, PostgreSQL, MySQL) so readers don't lock writers and writers don't lock readers. Only writers lock writers as you can see in the following example.
REPEATABLE_READ doesn't have to hold a lock to prevent a concurrent transaction from modifying your current transaction loaded rows. The MVCC engine allows other transactions to read the committed state of a row, even if your current transaction has changed it but hasn't yet committed (MVCC uses the undo logs to recover the previous version of a pending changed row).
In your use case you should use READ_COMMITTED as it scales better than other more strict isolation levels and you should use optimistic locking for preventing lost updates in long conversations.
Update
Setting #Transactional(isolation = Isolation.SERIALIZABLE) to a Spring bean has a different behaviour, depending on the current transaction type:
For RESOURCE_LOCAL transactions, the JpaTransactionManager can apply the specific isolation level for the current running transaction.
For JTA resources, the transaction-scoped isolation level doesn't propagate to the underlying database connection, as this is the default JTA transaction manager behavior. You could override this, following the example of the WebLogicJtaTransactionManager.
Actually readOnly=truedoesn’t cause any lock contention to the database table, because simply no locking is required - the database is able to revert back to previous versions of the records ignoring all new changes.
With readOnly as true, you will have the flush mode as FlushMode.NEVER in the current Hibernate Session preventing the session from committing the transaction. In addition, setReadOnly(true) will be called on the JDBC Connection, which is also a hint to the underlying database not to commit changes.
So readOnly=true is exactly what you are looking for (e.g. SERIALIZED isolation level).
Here is a good explanation.
Related
In order to connect to my Oracle DB, I use the thin Oracle JDBC driver and the tomcat JDBC connection pool.
The default setting tomcat-pool is not to touch the connection when it is returned to the pool (any associated transaction is not committed or rolled back, and the actual connection is not closed).
The default transaction isolation level of the Oracle driver is READ_COMMITTED, and auto-commit is set to false
I have several business methods that use the same connection pool to connect to the database. In some business methods, I explicitly create a new transaction, issue some updates and commit that transaction. So far so good.
Some other business methods are read-only. They are read-only because I only issue SELECT queries, not because I've explicitly set READ ONLY mode on the connection (it stays off).
My understanding is that under these conditions, Oracle creates a new implicit transaction the first time is encounters a SELECT statement, and keeps that transaction open until I explicitly COMMIT it.
However, given that I only issue SELECT queries, I don't feel that it is necessary to commit the transaction.
I understand that not committing those read-only transactions before returning their connections to the pool means that then next time the connection is borrowed from the pool, the next business method will execute within a transaction that was already started earlier.
My question are:
Does it matter?
Given the transaction isolation is set to READ_COMMITTED and auto-commit is false, is there any potential extra data race condition that wouldn't have existed if I committed my read-only transaction?
Does not committing the transaction incur performance costs on the DB server side (the transaction could potentially stay uncommitted for hours)?
Do the answers to the above questions change if we are connecting to RAC instead of a single Oracle DB instance?
What about if I'm selecting from a DB Link?
From all the resources I've read, my conclusion so far is that what I'm doing is technically not correct, but that as long as I don't pick a higher transaction isolation level it also doesn't matter. READ_COMMITTED will read whatever was most recently committed by other transactions regardless of whether or not the current transaction is committed or not, and the overhead on the oracle server for keeping track of a transaction that has no pending modifications is trivial. So fixing this issue falls into the "should be fixed, but not an emergency" category.
Does that make sense or did I misunderstand something important?
Like Alex Poole commented, it's true that the database engine does not necessarily need to create a transaction for certain cases of read operations for correct operation. I'm not familiar with Oracle internals, but I'll trust Alex that running this query after a select query will show that no actual TX was created.
However, the database engine is the lowest level and knows everything. On top of that we have DB client programs, one of which is the JDBC driver. The JDBC has its own standard, which includes things like setAutocommit(), a thing that databases don't know about, and which is used for transaction handling in JDBC code. Note that setAutocommit(false) is an abstraction, and it does not mean "start transaction", it just means that operations will be grouped in a transaction in some way.
This is where the JDBC vs. DB engine differences start. I'm basing this on PostgreSQL driver code (as it is more readily available), but I expect the Oracle driver to behave in similar way.
Since the driver cannot easily know whether a TX is needed or not, it cannot do heavy optimization like the actual DB engine. The PostgreSQL driver has a SUPPRESS_BEGIN flag that can be used to indicate that a transaction isn't needed even if one would be started otherwise, but based on a quick look it's used only when fetching metadata, an operation known to be safe from transaction issues. Otherwise the driver will issue a BEGIN to start a transaction even for a SELECT, when autocommit=false.
So even if the DB engine doesn't need a TX, the driver is an abstraction on top of the DB, working with JDBC specs. Since you've observed open transactions, it's safe to assume that the Oracle driver also creates explicit transactions for operations where the Oracle DB engine would not create implicit transactions.
So, does it matter?
As discussed in the comments, probably won't make a difference when you have some simple selects to local tables as the transactions aren't holding up significant resources. But throw in DB Links, changes to queries over time, and there's a non-zero amount of risk.
In he famous book "Java persistence with Hibernate" we can read the following:
"A persistence context is a cache of persistent entity instances.... Automatic dirty checking is one of the benefits of this caching. Another benefit is repeatable read for entities and the performance advantage of a unit of work-scoped cache... You don’t have to do anything special to enable the persistence context cache. It’s always on and, for the reasons shown, can’t be turned off.
Does this mean that one can never achieve a transaction isolation level "Read uncommitted" with Hibernate?
Does this mean that one can never achieve a transaction isolation level "Read uncommitted" with Hibernate?
No, it doesn't. Hibernate offers application-level repeatable reads for entities. That's different than DB-level repeatable reads which applies to any query.
So, if you want a custom isolation level, like REPEATABLE_READ for all the queries executed by a given transaction, not just for fetching entities, then you can set it like this:
#Transactional(isolation = Isolation.REPEATABLE_READ)
public void orderProduct(Long productId) {
...
}
Now, your question title says:
(How) can we achieve isolation level Read Uncommitted with Hibernate / JPA?
If you are using Oracle and PostgreSQL, you cannot do that since Read Uncommitted is not supported, and you'll get READ_COMMITTED instead.
For SQL Server and MySQL, set it like this:
#Transactional(isolation = Isolation.READ_UNCOMMITTED)
Indeed, hibernate does offer repeateble reads thorugh its first-level cache (Persistence Context) as cited in the book
"Transactions and concurrency control" by Vlad Mihalcea.
as follows:
Some ORM frameworks (e.g. JPA/Hibernate) offer application-level repeatable reads. The first snapshot of any retrieved entity is cached in the currently running Persistence Context. Any successive query returning the same database row is going to use the very same object that was previously cached. This way, the fuzzy reads may be prevented even in Read Committed isolation level.
However, according to the same book above, it seems that using Spring with JPA / hibernate allows customizing the transaction isolation level.
In the book above we can also read the following:
Spring supports transaction-level isolation levels when using the JpaTransactionManager. For JTA transactions, the JtaTransactionManager follows the Java EE standard and disallows overriding the default isolation level. As a workaround, the Spring framework provides extension points, so the application developer can customize the default behavior and implement a
mechanism to set isolation levels on a transaction basis.
While going through the hibernate document and reading through the read-only entities, the following hibernate official documentation says,
Hibernate does some optimizing for read-only entities:
It saves execution time by not dirty-checking simple properties or single-ended associations.
It saves memory by deleting database snapshots.
I don't understand what does it mean by deleting database snapshots.
Is it referring to some optimization that happens in the database? If so, how does hibernate inform/hint the DB to do that optimization? Is this optimization a database specific feature and so not guaranteed across the databases?
Or is it referring to an optimization that happens with in the hibernate library? I doubt this is the case, because whether it is readOnly or not the query fired by hibernate to fetch the records is same, but want to make sure I am not missing anything here.
UPDATE: As per the answer from #tgdavies, it helps hibernate not to keep the snapshots as the dirty checking is not needed.
Subsequently would like to understand if there is any link between JDBC readOnly and hibernate readOnly to enable db optimization. As per the Connection.html#setReadOnly it says - Puts this connection in read-only mode as a hint to the driver to enable database optimizations.. And what are those hints?
Can someone throw some light on how this optimization is actually achieved.
When Hibernate loads an object into a Session it creates a state snapshot of the current database state of the object, so that it can perform dirty checking against the snapshot.
As a read only object will never be modified, this snapshot is not needed and memory can be saved.
This is not an optimisation related to database access, but to reducing the memory used by the Session.
I doubt that Hibernate sets the JDBC connection to read only -- Hibernate doesn't know what else will happen in the Session. You could log the SQL Hibernate is sending to make sure: How to log final SQL queries with hibernate
I'm not sure what optimisations the database can perform on a read only connection -- probably taking fewer locks in some isolation modes, but that's just hand-waving on my part.
I have a scenario where I use a read on set of tables in a java service.
I've annotated the service class #Transactional.
Is there any possible way to lock the corresponding rows I read, in all the tables I use, in my transaction and release it at the end of transaction ?
Ps: I'm using spring Hibernate, and I'm new to this locking concept.
any material/ examples links would be of much help
Thanks
This depends on the underlying database engine and selected transaction isolation level.
Some database systems do locking for reads, and some use MVCC, which means your updates won't be visible to other transactions until your transaction finishes and your transaction will operate on a snapshot of data taken at the start of the transaction.
So a simple answer is: choose appropriately high transaction isolation level (e.g. SERIALIZABLE) for your needs and a database engine that supports it.
http://en.wikipedia.org/wiki/Isolation_(database_systems)
I'm writing some application for GlassFish 2.1.1 (JavaEE 5, JPA 1.0, as far as I know). I have the following code in my servlet (which I mostly borrowed from some sample on the Internet):
#PersistenceContext(name = "persistence/em", unitName = "pu")
private EntityManager em;
#Resource
private UserTransaction utx;
#Override
protected void doPost(...) {
utx.begin();
. . . perform retrieving operations on em . . .
utx.rollback();
}
web.xml has the following in it:
<persistence-context-ref>
<persistence-context-ref-name>persistence/em</persistence-context-ref-name>
<persistence-unit-name>pu</persistence-unit-name>
</persistence-context-ref>
The problem is, the em doesn't see changes that have been made in another, outside transaction. Roughly, I make a request to my servlet from web browser, see data, perform some DML in SQL console, reload servlet page -- and it doesn't show any change. I've tried to use many combinations of em.flush, and utx.rollback, and em.joinTransaction, but it doesn't seem to do any good.
Situation is complicated by me being a total newbie in JPA, so I do not have a clear understanding of how the underlying machinery works. So any help and -- more importantly -- explanations/links of what is happening there would be very appreciated. Thanks!
The JPA implementation maintains a cache of entities that have been accessed. When you perform operations in a different transaction without using JPA, the cache is no longer up to date, and hence you never see the changes made in it.
If you do wish to see the changes, you will have to refresh the cache, in which case all entities will be evicted from the cache. Of course, you'll need to know when to do this (after the other transaction has completed), otherwise you'll continue to see ambiguous entities. If this is your business need, then JPA is possibly not a good fit to your problem domain.
Related:
Are entities cached in jpa by default ?
Invalidating JPA EntityManager session
As axtavt says, you need to commit the transaction in the console. Assuming you did that, it is also possible data is still being cached by the PersistenceManager (or underlying infrastructure).
To prevent trouble with caching you can evict by hand (which may be tricky as you have to know when to evict) or you can go to pessimistic locking. Pessimistic locking can have a huge impact on performance, but if you have multiple independent connections to the database you may not have a choice.
If your process has concurrent read/writes from different sources the whole time, you may really need pessimistic locks. If you sometimes have a batch update from an external source, you may try to signal, from that batch job, your JPA application that it should evict. Perhaps via a web service or so. That way you would not incur pessimistic locking performance degradation the entire time.
The wise lesson here is that synchronization of processes can be really complicated :)
Perhaps you need to commit a transaction made in SQL console.