Hibernate read only entities says saves memory by deleting database snapshots

Hibernate read only entities says saves memory by deleting database snapshots - java

While going through the hibernate document and reading through the read-only entities, the following hibernate official documentation says,
Hibernate does some optimizing for read-only entities:
It saves execution time by not dirty-checking simple properties or single-ended associations.
It saves memory by deleting database snapshots.
I don't understand what does it mean by deleting database snapshots.
Is it referring to some optimization that happens in the database? If so, how does hibernate inform/hint the DB to do that optimization? Is this optimization a database specific feature and so not guaranteed across the databases?
Or is it referring to an optimization that happens with in the hibernate library? I doubt this is the case, because whether it is readOnly or not the query fired by hibernate to fetch the records is same, but want to make sure I am not missing anything here.
UPDATE: As per the answer from #tgdavies, it helps hibernate not to keep the snapshots as the dirty checking is not needed.
Subsequently would like to understand if there is any link between JDBC readOnly and hibernate readOnly to enable db optimization. As per the Connection.html#setReadOnly it says - Puts this connection in read-only mode as a hint to the driver to enable database optimizations.. And what are those hints?
Can someone throw some light on how this optimization is actually achieved.

When Hibernate loads an object into a Session it creates a state snapshot of the current database state of the object, so that it can perform dirty checking against the snapshot.
As a read only object will never be modified, this snapshot is not needed and memory can be saved.
This is not an optimisation related to database access, but to reducing the memory used by the Session.
I doubt that Hibernate sets the JDBC connection to read only -- Hibernate doesn't know what else will happen in the Session. You could log the SQL Hibernate is sending to make sure: How to log final SQL queries with hibernate
I'm not sure what optimisations the database can perform on a read only connection -- probably taking fewer locks in some isolation modes, but that's just hand-waving on my part.

Related

With Oracle JDBC, what problems will I face if I do not COMMIT select-only transactions

In order to connect to my Oracle DB, I use the thin Oracle JDBC driver and the tomcat JDBC connection pool.
The default setting tomcat-pool is not to touch the connection when it is returned to the pool (any associated transaction is not committed or rolled back, and the actual connection is not closed).
The default transaction isolation level of the Oracle driver is READ_COMMITTED, and auto-commit is set to false
I have several business methods that use the same connection pool to connect to the database. In some business methods, I explicitly create a new transaction, issue some updates and commit that transaction. So far so good.
Some other business methods are read-only. They are read-only because I only issue SELECT queries, not because I've explicitly set READ ONLY mode on the connection (it stays off).
My understanding is that under these conditions, Oracle creates a new implicit transaction the first time is encounters a SELECT statement, and keeps that transaction open until I explicitly COMMIT it.
However, given that I only issue SELECT queries, I don't feel that it is necessary to commit the transaction.
I understand that not committing those read-only transactions before returning their connections to the pool means that then next time the connection is borrowed from the pool, the next business method will execute within a transaction that was already started earlier.
My question are:
Does it matter?
Given the transaction isolation is set to READ_COMMITTED and auto-commit is false, is there any potential extra data race condition that wouldn't have existed if I committed my read-only transaction?
Does not committing the transaction incur performance costs on the DB server side (the transaction could potentially stay uncommitted for hours)?
Do the answers to the above questions change if we are connecting to RAC instead of a single Oracle DB instance?
What about if I'm selecting from a DB Link?
From all the resources I've read, my conclusion so far is that what I'm doing is technically not correct, but that as long as I don't pick a higher transaction isolation level it also doesn't matter. READ_COMMITTED will read whatever was most recently committed by other transactions regardless of whether or not the current transaction is committed or not, and the overhead on the oracle server for keeping track of a transaction that has no pending modifications is trivial. So fixing this issue falls into the "should be fixed, but not an emergency" category.
Does that make sense or did I misunderstand something important?

Like Alex Poole commented, it's true that the database engine does not necessarily need to create a transaction for certain cases of read operations for correct operation. I'm not familiar with Oracle internals, but I'll trust Alex that running this query after a select query will show that no actual TX was created.
However, the database engine is the lowest level and knows everything. On top of that we have DB client programs, one of which is the JDBC driver. The JDBC has its own standard, which includes things like setAutocommit(), a thing that databases don't know about, and which is used for transaction handling in JDBC code. Note that setAutocommit(false) is an abstraction, and it does not mean "start transaction", it just means that operations will be grouped in a transaction in some way.
This is where the JDBC vs. DB engine differences start. I'm basing this on PostgreSQL driver code (as it is more readily available), but I expect the Oracle driver to behave in similar way.
Since the driver cannot easily know whether a TX is needed or not, it cannot do heavy optimization like the actual DB engine. The PostgreSQL driver has a SUPPRESS_BEGIN flag that can be used to indicate that a transaction isn't needed even if one would be started otherwise, but based on a quick look it's used only when fetching metadata, an operation known to be safe from transaction issues. Otherwise the driver will issue a BEGIN to start a transaction even for a SELECT, when autocommit=false.
So even if the DB engine doesn't need a TX, the driver is an abstraction on top of the DB, working with JDBC specs. Since you've observed open transactions, it's safe to assume that the Oracle driver also creates explicit transactions for operations where the Oracle DB engine would not create implicit transactions.
So, does it matter?
As discussed in the comments, probably won't make a difference when you have some simple selects to local tables as the transactions aren't holding up significant resources. But throw in DB Links, changes to queries over time, and there's a non-zero amount of risk.

Spring Transaction Isolation Level

Most of us might be using Spring and Hibernate for data access.
I am trying to understand few of the internals of Spring Transaction Manager.
According to Spring API, it supports different Isolation Level - doc
But I couldn't find clear cut information on which occasions these are really helpful to gain performance improvements.
I am aware that readOnly parameter from Spring Transaction can help us to use different TxManagers to read-only data and can leverage good performance. But it locks the table to get the data to avoid dirty-reads/non-committed reads - doc.
Assume, in few occasions, we might want to blindly insert the records into a table and retrieve the information without locking the table, a case where we never update the table data, we just insert and read [append-only]. Can we use better Isolation to gain any performance?
As you see from one of the reference links, do we really require to implement/write our own CustomJPADiaelect?
What's the better Isolation for my requirement?

Read-only allows certain optimizations like disabling dirty checking and you should totally use it when you don't plan on changing an entity.
Each isolation level defines how much locking a database has to impose for ensuring the data anomaly prevention.
Most database use MVCC (Oracle, PostgreSQL, MySQL) so readers don't lock writers and writers don't lock readers. Only writers lock writers as you can see in the following example.
REPEATABLE_READ doesn't have to hold a lock to prevent a concurrent transaction from modifying your current transaction loaded rows. The MVCC engine allows other transactions to read the committed state of a row, even if your current transaction has changed it but hasn't yet committed (MVCC uses the undo logs to recover the previous version of a pending changed row).
In your use case you should use READ_COMMITTED as it scales better than other more strict isolation levels and you should use optimistic locking for preventing lost updates in long conversations.
Update
Setting #Transactional(isolation = Isolation.SERIALIZABLE) to a Spring bean has a different behaviour, depending on the current transaction type:
For RESOURCE_LOCAL transactions, the JpaTransactionManager can apply the specific isolation level for the current running transaction.
For JTA resources, the transaction-scoped isolation level doesn't propagate to the underlying database connection, as this is the default JTA transaction manager behavior. You could override this, following the example of the WebLogicJtaTransactionManager.

Actually readOnly=truedoesn’t cause any lock contention to the database table, because simply no locking is required - the database is able to revert back to previous versions of the records ignoring all new changes.
With readOnly as true, you will have the flush mode as FlushMode.NEVER in the current Hibernate Session preventing the session from committing the transaction. In addition, setReadOnly(true) will be called on the JDBC Connection, which is also a hint to the underlying database not to commit changes.
So readOnly=true is exactly what you are looking for (e.g. SERIALIZED isolation level).
Here is a good explanation.

How to coordinate J2EE and Java EE database access?

We have a somewhat huge application which started a decade ago and is still under active development. So some parts are still in J2EE 1.4 architecture, others using Java EE 5/6.
While testing some new code, I realized that I had data inconsistency between information coming in through old and new code parts, where the old one uses the Hibernate session directly and the new one an injected EntityManager. This led to the problem, that one part couldn't see new data from the other part and thus also created a database record, resulting in primary key constraint violation.
It is planned to migrate the old code completely to get rid of J2EE, but in the meantime - what can I do to coordinate database access between the two parts? And shouldn't at some point within the application server both ways come together in the Hibernate layer, regardless if accessed via JPA or directly?

You can mix both Hibernate Session and Entity Manager in the same application without any problem. The EntityManagerImpl simply delegates calls the a private SessionImpl instance.
What you describe is a Transaction configuration anomaly. Every database transaction runs in isolation (unless you use REAN_UNCOMMITED which I guess it's not the case), but once you commit it the changes are available from any other transaction or connection. So once a transaction is committed you should see al changes in any other Hibernate Session, JDBC connection or even your database UI manager tool.
You said that there was a primary key conflict. This can't happen if you use Hibernate identity or sequence generator. For the old hi-lo generator you can have problems if an external connection tries to insert records in the same table Hibernate uses an old hi/lo identifier generator.
This problem can also occur if there is a master/master replication anomaly. If you have multiple nodes and there is no strict consistency replication you can end up with primar key constraint violations.
Update
Solution 1:
When coordinating the new and the old code trying to insert the same entity, you could have a slect-than-insert logic running in a SERIALIZABLE transaction. The SERIALIZABLE transaction acquires the appropriate locks on tour behalf and so you can still have a default READ_COMMITTED isolation level, while only the problematic Service methods are marked as SERIALIZABLE.
So both the old code and the new code have this logic running a select for checking if there is already a row satisfying the select constraint, only to insert it if nothing is found. The SERIALIZABLE isolation level prevents phantom reads so I think it should prevent constraint violations.
Solution 2:
If you are open to delegate this task to JDBC, you might also investigate the MERGE SQL statement, if your current database supports it. Basically, this is an upsert operation issuing an update or an insert behind the scenes. This command is much more attractive since you can still run it with even on READ_COMMITTED. The only drawback is that you can't use Hibernate for it, and only some databases support it.

If you instanciate separately a SessionFactory for the old code and an EntityManagerFactory for new code, that can lead to different value in first level cache. If during a single Http request, you change a value in old code, but do not immediately commit, the value will be changed in session cache, but it will not be available for new code until it is commited. Independentely of any transaction or database locking that would protect persistent values, that mix of two different Hibernate session can give weird things for in memory values.
I admit that the injected EntityManager still uses Hibernate. IMHO the most robust solution is to get the EntityManagerFactory for the PersistenceUnit and cast it to an Hibernate EntityManagerFactoryImpl. Then you can directly access the the underlying SessionFactory :
SessionFactory sessionFactory = entityManagerFactory.getSessionFactory();
You can then safely use this SessionFactory in your old code, because now it is unique in your application and shared between old and new code.
You still have to deal with the problem of session creation-close and transaction management. I suppose it is allready implemented in old code. Without knowing more, I think that you should port it to JPA, because I am pretty sure that if an EntityManager exists, sessionFactory.getCurrentSession() will give its underlying Session but I cannot affirm anything for the opposite.

I've run into a similar problem when I had a list of enumerated lookup values, where two pieces of code would check for the existence of a given value in the list, and if it didn't exist the code would create a new entry in the database. When both of them came across the same non-existent value, they'd both try to create a new one and one would have its transaction rolled back (throwing away a bunch of other work we'd done in the transaction).
Our solution was to create those lookup values in a separate transaction that committed immediately; if that transaction succeeded, then we knew we could use that object, and if it failed, then we knew we simply needed to perform a get to retrieve the one saved by another process. Once we had a lookup object that we knew was safe to use in our session, we could happily do the rest of the DB modifications without risking the transaction being rolled back.
It's hard to know from your description whether your data model would lend itself to a similar approach, where you'd at least commit the initial version of the entity right away, and then once you're sure you're working with a persistent object you could do the rest of the DB modifications that you knew you needed to do. But if you can find a way to make that work, it would avoid the need to share the Session between the different pieces of code (and would work even if the old and new code were running in separate JVMs).

Eclipselink/JPA persist a record once or insert each field separately?

I have a question about persist and merge strategy of eclipselink. I would like to know how eclipselink/JPA inserts and updates records. Is it insert/update one by one into database? or it is saving them in a log file and then flush them to the database?
It is important for me, because I am going to have a history table with trigger that triggs when insertion and update. so if for example update is happening on each field, and 3 fields are updated, then I will have 3 records in history table or one?
I will be appreciated if anyone answers me and also leave some reference link for further information.

The persistence provider is quite free to flush changes whenever it sees fit. So you cannot reliably predict the number of update callbacks or the expected SQL statements.
In general, the provider will flush changes before each query to make changes in the persistence context available to the query. You can hint the provider to defer the flush until commit time, but the provider still can flush at will.
Please see the relevant chapters of the JPA (2.0) spec:
§3.2.4 Synchronization to the Database
§3.8.7 Queries and Flush Mode
EDIT: There is an important point to flushing and transaction isolation. The changes are flushed to the database and the lifecycle listeners are invoked, but the data is not committed and not visible to other transactions - the read-committed isolation is the default. The commit itself is atomic.
I am not sure what the consequences of a server crash would be, but under normal circumstances, data integrity is ensured.

How long are Entities persited?

I am trying to understand how JPA works. From what I know, if you persist an Entity, that object will remain in the memory until the application is closed. This means, that when I look for a previously persisted entity, there will be no query made on the database. Assuming that no insert, update or delete is made, if the application runs long enough, all the information in it might become persistent. Does this mean that at some point, I will no longer need the database?
Edit
My problem is not with the database. I am sure that the database can not be modified from outside the application. I am managing transactions by myself, so the data gets stored in the database as soon as I commit. My question is: What happens with the entities after I commit? Are they kept in the memory and act like a cache? If so, how long are they kept there? After I commit a persist, I make a select query. This select should return the object I persisted before. Will that object be brought from memory, or will the application query the database?

Not really. Think about it.
Your application probably isn't the only thing that will use the database. If an entity was persisted once and stored in memory, how can you be sure that, let's say, one hour later, it won't be changed by some other means? If that happens, you will have stale data that can harm logic of your application.
Storing data in memory and hoping that everything will be alright won't bring any benefits. That's why data stored in database is your primary source of information, and you should query it every time, unless you are absolutely sure that a subset of data won't change.

When you persist an entity an entity this will add it to the persistence context which acts like a first level cache (this is in-memory). When the actual persisting happens depends on whether you use container managed transactions or deal with transactions yourself. The entity instance will live in memory as long as the transaction is not commited, and when it is it will be persisted to the database or XML etc.

JPA can't work with only the persistence context (L1 cache) or the explicit cache (L2 cache). It always needs to be combined with a datasource, and this datasource typically points to a database that persists to stable storage.
So, the entity is in memory only as long as the transaction (which is required for JPA persist operations) isn't committed. After that it's send to the datasource.
If the transaction manager is transaction scoped (the 'normal' case) then the L1 cache (the persistence context) is closed and the entities do not longer exist there. If the L1 cache somehow bothers you, you can manage it a bit explicitly. There are operations to clear it and you could separate your read operations (which don't need transactions) from write operations. If there's no transaction active when reading, there's no persistence context, an entity becomes never attached and is thus never put into this L1 cache.
The L2 cache however is not cleared when the transaction commits and entities inside it remain available for the entire application. This L2 cache must be explicitly configured and you as an application developer must indicate which entities should be cached in it. Via vendor specific mechanisms (e.g. JBoss Cache, Infinispan) you can put a max on the number of entities being cached and set/define so-called eviction policies.
Of course, nothing prevents you from letting the datasource point to an in-memmory embedded DB, but this is outside the knowledge of JPA.

Persistence means in short terms: you can shut down your app, and the data is not lost.
To achieve that you need a database or some sort of saving data in a way that it's not lost when you shut down the app.

To "persist" an entity means to actually save it in the data base. Sure, JPA maintains some entity information in memory in the persistence context (and this is highly dependent on configuration and programming practices), but at certain point information will be stored in the data base - for instance, when a transaction commits, or likely (but not necessarily) after flush() or merge() operations.

If you want to keep your entities after committing and for a select query, you need to use the query cache. Just Google around on that term and it should be clear to you.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.