We have a somewhat huge application which started a decade ago and is still under active development. So some parts are still in J2EE 1.4 architecture, others using Java EE 5/6.
While testing some new code, I realized that I had data inconsistency between information coming in through old and new code parts, where the old one uses the Hibernate session directly and the new one an injected EntityManager. This led to the problem, that one part couldn't see new data from the other part and thus also created a database record, resulting in primary key constraint violation.
It is planned to migrate the old code completely to get rid of J2EE, but in the meantime - what can I do to coordinate database access between the two parts? And shouldn't at some point within the application server both ways come together in the Hibernate layer, regardless if accessed via JPA or directly?
You can mix both Hibernate Session and Entity Manager in the same application without any problem. The EntityManagerImpl simply delegates calls the a private SessionImpl instance.
What you describe is a Transaction configuration anomaly. Every database transaction runs in isolation (unless you use REAN_UNCOMMITED which I guess it's not the case), but once you commit it the changes are available from any other transaction or connection. So once a transaction is committed you should see al changes in any other Hibernate Session, JDBC connection or even your database UI manager tool.
You said that there was a primary key conflict. This can't happen if you use Hibernate identity or sequence generator. For the old hi-lo generator you can have problems if an external connection tries to insert records in the same table Hibernate uses an old hi/lo identifier generator.
This problem can also occur if there is a master/master replication anomaly. If you have multiple nodes and there is no strict consistency replication you can end up with primar key constraint violations.
Update
Solution 1:
When coordinating the new and the old code trying to insert the same entity, you could have a slect-than-insert logic running in a SERIALIZABLE transaction. The SERIALIZABLE transaction acquires the appropriate locks on tour behalf and so you can still have a default READ_COMMITTED isolation level, while only the problematic Service methods are marked as SERIALIZABLE.
So both the old code and the new code have this logic running a select for checking if there is already a row satisfying the select constraint, only to insert it if nothing is found. The SERIALIZABLE isolation level prevents phantom reads so I think it should prevent constraint violations.
Solution 2:
If you are open to delegate this task to JDBC, you might also investigate the MERGE SQL statement, if your current database supports it. Basically, this is an upsert operation issuing an update or an insert behind the scenes. This command is much more attractive since you can still run it with even on READ_COMMITTED. The only drawback is that you can't use Hibernate for it, and only some databases support it.
If you instanciate separately a SessionFactory for the old code and an EntityManagerFactory for new code, that can lead to different value in first level cache. If during a single Http request, you change a value in old code, but do not immediately commit, the value will be changed in session cache, but it will not be available for new code until it is commited. Independentely of any transaction or database locking that would protect persistent values, that mix of two different Hibernate session can give weird things for in memory values.
I admit that the injected EntityManager still uses Hibernate. IMHO the most robust solution is to get the EntityManagerFactory for the PersistenceUnit and cast it to an Hibernate EntityManagerFactoryImpl. Then you can directly access the the underlying SessionFactory :
SessionFactory sessionFactory = entityManagerFactory.getSessionFactory();
You can then safely use this SessionFactory in your old code, because now it is unique in your application and shared between old and new code.
You still have to deal with the problem of session creation-close and transaction management. I suppose it is allready implemented in old code. Without knowing more, I think that you should port it to JPA, because I am pretty sure that if an EntityManager exists, sessionFactory.getCurrentSession() will give its underlying Session but I cannot affirm anything for the opposite.
I've run into a similar problem when I had a list of enumerated lookup values, where two pieces of code would check for the existence of a given value in the list, and if it didn't exist the code would create a new entry in the database. When both of them came across the same non-existent value, they'd both try to create a new one and one would have its transaction rolled back (throwing away a bunch of other work we'd done in the transaction).
Our solution was to create those lookup values in a separate transaction that committed immediately; if that transaction succeeded, then we knew we could use that object, and if it failed, then we knew we simply needed to perform a get to retrieve the one saved by another process. Once we had a lookup object that we knew was safe to use in our session, we could happily do the rest of the DB modifications without risking the transaction being rolled back.
It's hard to know from your description whether your data model would lend itself to a similar approach, where you'd at least commit the initial version of the entity right away, and then once you're sure you're working with a persistent object you could do the rest of the DB modifications that you knew you needed to do. But if you can find a way to make that work, it would avoid the need to share the Session between the different pieces of code (and would work even if the old and new code were running in separate JVMs).
Related
This seems like it would come up often, but I've Googled to no avail.
Suppose you have a Hibernate entity User. You have one User in your DB with id 1.
You have two threads running, A and B. They do the following:
A gets user 1 and closes its Session
B gets user 1 and deletes it
A changes a field on user 1
A gets a new Session and merges user 1
All my testing indicates that the merge attempts to find user 1 in the DB (it can't, obviously), so it inserts a new user with id 2.
My expectation, on the other hand, would be that Hibernate would see that the user being merged was not new (because it has an ID). It would try to find the user in the DB, which would fail, so it would not attempt an insert or an update. Ideally it would throw some kind of concurrency exception.
Note that I am using optimistic locking through #Version, and that does not help matters.
So, questions:
Is my observed Hibernate behaviour the intended behaviour?
If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
If the answer to 2. is yes, why is nobody complaining about it?
Please see the text from hibernate documentation below.
Copy the state of the given object onto the persistent object with the same identifier. If there is no persistent instance currently associated with the session, it will be loaded. Return the persistent instance. If the given instance is unsaved, save a copy of and return it as a newly persistent instance.
It clearly stated that copy the state(data) of object in database. if object is not there then save a copy of that data. When we say save a copy hibernate always create a record with new identifier.
Hibernate merge function works something like as follows.
It checks the status(attached or detached to the session) of entity and found it detached.
Then it tries to load the entity with identifier but not found in database.
As entity is not found then it treat that entity as transient.
Transient entity always create a new database record with new identifier.
Locking is always applied to attached entities. If entity is detached then hibernate will always load it and version value gets updated.
Locking is used to control concurrency problems. It is not the concurrency issue.
I've been looking at JSR-220, from which Session#merge claims to get its semantics. The JSR is sadly ambiguous, I have found.
It does say:
Optimistic locking is a technique that is used to insure that updates
to the database data corresponding to the state of an entity are made
only when no intervening transaction has updated that data since the
entity state was read.
If you take "updates" to include general mutation of the database data, including deletes, and not just a SQL UPDATE, which I do, I think you can make an argument that the observed behaviour is not compliant with optimistic locking.
Many people agree, given the comments on my question and the subsequent discovery of this bug.
From a purely practical point of view, the behaviour, compliant or not, could lead to quite a few bugs, because it is contrary to many developers' expectations. There does not seem to be an easy fix for it. In fact, Spring Data JPA seems to ignore this issue completely by blindly using EM#merge. Maybe other JPA providers handle this differently, but with Hibernate this could cause issues.
I'm actually working around this by using Session#update currently. It's really ugly, and requires code to handle the case when you try to update an entity that is detached, and there's a managed copy of it already. But, it won't lead to spurious inserts either.
1.Is my observed Hibernate behaviour the intended behaviour?
The behavior is correct. You just trying to do operations that are not protected against concurrent data modification :) If you have to split the operation into two sessions. Just find the object for update again and check if it is still there, throw exception if not. If there is one then lock it by using em.(class, primary key, LockModeType); or using #Version or #Entity(optimisticLock=OptimisticLockType.ALL/DIRTY/VERSION) to protect the object till the end of the transaction.
2.If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
Probably: yes
3.If the answer to 2. is yes, why is nobody complaining about it?
Because if you protect your operations using pessimistic or optimistic locking the problem will disappear:)
The problem you are trying to solve is called: Non-repeatable read
ThreadLocal<Session> tl = new ThreadLocal<Session>();
tl.set(session);
to get the session,
Employee emp = (Employee)((Session)tl.get().get(Employee.class, 1));
If our application is web based, the web container creates a separate thread for each request.
If all these requests concurrently using the same single Session object , we should get
unwanted results in our database operations.
To overcome from above results, it is good practice to set our session to threadLocal object
which does not allows concurrent usage of session.I think, If it is correct the application performance should be very poor.
What is the good approach in above scenarios.
If I'm in wrong track , in which situations we need to go for ThreadLocal.
I'm new to hibernate, please excuse me if this type questioning is silly.
thanks in advance.
Putting the Hibernate Session in ThreadLocal is unlikely to achieve the isolation between requests that you want. Surely you create a new Session for each request using a SessionFactory backed by a connection pooling implementation of DataSource, which means that the local reference to the Session is on the stack anyway. Changing that local reference to a member variable only complicates the code, imho.
Anyhow, ensuring isolation within a single container doesn't address the actual problem - how is data accessed efficiently while maintaining consistency within a multi-threaded environment.
There are two parts to the problem you mention - the first is that a database connection is an expensive resource, the second that you need to ensure some level of data consistency between threads/requests.
The general approach to the resource problem is to use a database connection pool (which I'd guess you're already doing). As each request is processed, connections are obtained from the pool and returned when finished but importantly the connections in the pool are maintained beyond the lifetime of a request thus avoiding the cost of creating a connection each time it is needed.
The consistency problem is a little trickier and there's no one size fits all model. What you need to be doing is thinking about what level of consistency you need - questions like does it matter if data is read at the same time it's being written, do updates absolutely have to be atomic, etc.
Once you know the answer to these questions there two places you need to look at consistency - in the database and in the code.
With the database you need to look at database level locks and create a scheme suitable for your application by applying that appropriate isolation levels.
With the code, things are a little more complicated. Data is often loaded and displayed for a period of time before updates are written back - no problem if there's a single user but in a multi-user system it's possible that updates are made based on stale data or multiple updates occur simulatiously. It may be acceptable to have a policy of last update wins, in which case it's simple, but if not you'll need to be using version numbers or old/new comparisons to ensure integrity at the time the updates are applied.
I am not sure if you have compulsion of using ThreadLocal. Using ThreadLocal to store session object is definitely is not a good idea, specially when you are using hibernate along with spring.
A typical scheme for using Hibernate with Spring is:
Inject the sessionFactory in your DAO. I assume that you have sessionFactory already configured which is backed by a pooled datasource.
Now in your DAO class, a session can be accessed as follows.
Session session = sessionFactory.getCurrentSession();
Here is a link to related article.
Please note that this example is specific to Hiberante 3.x APIs. This takes care of session creation/closure/thread-safety aspect internally and its neat too.
I am trying to understand how JPA works. From what I know, if you persist an Entity, that object will remain in the memory until the application is closed. This means, that when I look for a previously persisted entity, there will be no query made on the database. Assuming that no insert, update or delete is made, if the application runs long enough, all the information in it might become persistent. Does this mean that at some point, I will no longer need the database?
Edit
My problem is not with the database. I am sure that the database can not be modified from outside the application. I am managing transactions by myself, so the data gets stored in the database as soon as I commit. My question is: What happens with the entities after I commit? Are they kept in the memory and act like a cache? If so, how long are they kept there? After I commit a persist, I make a select query. This select should return the object I persisted before. Will that object be brought from memory, or will the application query the database?
Not really. Think about it.
Your application probably isn't the only thing that will use the database. If an entity was persisted once and stored in memory, how can you be sure that, let's say, one hour later, it won't be changed by some other means? If that happens, you will have stale data that can harm logic of your application.
Storing data in memory and hoping that everything will be alright won't bring any benefits. That's why data stored in database is your primary source of information, and you should query it every time, unless you are absolutely sure that a subset of data won't change.
When you persist an entity an entity this will add it to the persistence context which acts like a first level cache (this is in-memory). When the actual persisting happens depends on whether you use container managed transactions or deal with transactions yourself. The entity instance will live in memory as long as the transaction is not commited, and when it is it will be persisted to the database or XML etc.
JPA can't work with only the persistence context (L1 cache) or the explicit cache (L2 cache). It always needs to be combined with a datasource, and this datasource typically points to a database that persists to stable storage.
So, the entity is in memory only as long as the transaction (which is required for JPA persist operations) isn't committed. After that it's send to the datasource.
If the transaction manager is transaction scoped (the 'normal' case) then the L1 cache (the persistence context) is closed and the entities do not longer exist there. If the L1 cache somehow bothers you, you can manage it a bit explicitly. There are operations to clear it and you could separate your read operations (which don't need transactions) from write operations. If there's no transaction active when reading, there's no persistence context, an entity becomes never attached and is thus never put into this L1 cache.
The L2 cache however is not cleared when the transaction commits and entities inside it remain available for the entire application. This L2 cache must be explicitly configured and you as an application developer must indicate which entities should be cached in it. Via vendor specific mechanisms (e.g. JBoss Cache, Infinispan) you can put a max on the number of entities being cached and set/define so-called eviction policies.
Of course, nothing prevents you from letting the datasource point to an in-memmory embedded DB, but this is outside the knowledge of JPA.
Persistence means in short terms: you can shut down your app, and the data is not lost.
To achieve that you need a database or some sort of saving data in a way that it's not lost when you shut down the app.
To "persist" an entity means to actually save it in the data base. Sure, JPA maintains some entity information in memory in the persistence context (and this is highly dependent on configuration and programming practices), but at certain point information will be stored in the data base - for instance, when a transaction commits, or likely (but not necessarily) after flush() or merge() operations.
If you want to keep your entities after committing and for a select query, you need to use the query cache. Just Google around on that term and it should be clear to you.
we are developing an (JavaSE-) application which communicates to many clients via persistent tcp-connections. The client connects, performs some/many operations (which are updated to a SQL-Database) and closes the application / disconnects from server. We're using Hibernate-JPA and manage the EntityManager-lifecycle on our own, using a ThreadLocal-variable. Actually we create a new EntityManager-instance on every client-request which works fine so far. Recently we profiled a bit and we found out that hibernate performs a SELECT-query to the DB before every UPDATE-statement. That is because our entities are in detached-state and every new EntityManager attaches the entity to the persistence context first. This leads to a massive SQL-overhead when the server is under load (because we have an write-heavy application)and we try to eliminate that leak.
First, we thought about 2nd-Level-Cache. However, we discovered that hibernate invalidates it's Query- and Collection-Caches whenever a new item is added or removed.
On second thought, we evaluate whether to keep an EntityManager up as long as the client is logged in on the server. But I wonder if this is a "best practice", because there are some drawbacks: thread-safety, managing-overhead of the EntityManager-instances, etc.
In short: we are looking for a way to get rid of those SELECT-statements before every UPDATE. Any ideas out there?
One possible way to get rid of select statements when reattaching detached entities is to use Hibernate-specific update() operation instead of merge().
update() unconditionally runs an update SQL statement and makes detached object persistent. If persistent object with the same identifier already exists in a session, it throws an exception. Thus, it's a good choice when you are sure that:
Detached object contains modified state that should be saved in the database
Saving that state is the main goal of opening a session for that request (i.e. there were no other operations that loaded entity with the same id in that session)
In JPA 2.0 you can access Hibernate-specific operations as follows:
em.unwrap(Session.class).update(o);
See also:
11.6. Modifying detached objects
One possible option would be to use StatelessSession for the update statements. I've successfully used it in my 'write-heavy' application.
We are using Hibernate Spring MVC with OpenSessionInView filter.
Here is a problem we are running into (pseudo code)
transaction 1
load object foo
transaction 1 end
update foo's properties (not calling session.save or session.update but only foo's setters)
validate foo (using hibernate validator)
if validation fails ?
go back to edit screen
transaction 2 (read only)
load form backing objects from db
transaction 2 end
go to view
else
transaction 3
session.update(foo)
transaction 3 end
the problem we have is if the validation fails
foo is marked "dirty" in the hibernate session (since we use OpenSessionInView we only have one session throughout the http request), when we load the form backing objects (like a list of some entities using an HQL query), hibernate before performing the query checks if there are dirty objects in the session, it sees that foo is and flushes it, when transaction 2 is committed the updates are written to the database.
The problem is that even though it is a read only transaction and even though foo wasn't updated in transaction 2 hibernate doesn't have knowledge of which object was updated in which transaction and doesn't flush only objects from that transaction.
Any suggestions? did somebody ran into similar problem before
Update: this post sheds some more light on the problem: http://brian.pontarelli.com/2007/04/03/hibernate-pitfalls-part-2/
You can run a get on foo to put it into the hibernate session, and then replace it with the object you created elsewhere. But for this to work, you have to know all the ids for your objects so that the ids will look correct to Hibernate.
There are a couple of options here. First is that you don't actually need transaction 2 since the session is open you could just load the backing objects from the db, thus avoiding the dirty check on the session. The other option is to evict foo from the session after it is retrieved and later use session.merge() to reattach it when you what your changes to be stored.
With hibernate it is important to understand what exactly is going on under the covers. At every commit boundary it will attempt to flush all changes to objects in the current session regardless of whether or not the changes where made in the current transaction or any transaction at all for that matter. This is way you don't actually need to call session.update() for any object that is already in the session.
Hope this helps
There is a design issue here. Do you think an ORM is a transparent abstraction of your datastore, or do you think it's a set of data manipulation libraries? I would say that Hibernate is the former. Its whole reason for existing is to remove the distinction between your in-memory object state and your database state. It does provide low-level mechanisms to allow you to pry the two apart and deal with them separately, but by doing so you're removing a lot of Hibernate's value.
So very simply - Hibernate = your database. If you don't want something persisted, don't change your persistent objects.
Validate your data before you update your domain objects. By all means validate domain objects as well, but that's a last line of defense. If you do get a validation error on a persistent object, don't swallow the exception. Unless you prevent it, Hibernate will do the right thing, which is to close the session there and then.
What about using Session.clear() and/or Session.evict()?
What about setting singleSession=false on the filter? That might put your operations into separate sessions so you don't have to deal with the 1st level cache issues. Otherwise you will probably want to detach/attach your objects manually as the user above suggests. You could also change the FlushMode on your Session if you don't want things being flushed automatically (FlushMode.MANUAL).
Implement a service layer, take a look at spring's #Transactional annotation, and mark your methods as #Transactional(readOnly=true) where applicable.
Your flush mode is probably set to auto, which means you don't really have control of when a DB commit happens.
You could also set your flush mode to manual, and your services/repos will only try to synchronize the db with your app when you tell them to.