JPA and first level cache, whats the point? - java

EntityManager maintains first level cache for retrieved objects, but if you want to have threadsafe aplication you're creating and closing entityManager for each transaction.
So whats the point of the level 1 cache if those entities are created and closed for every transaction? Or entityManager cache is usable if youre working in single thread?

The point is to have an application that works like you expect it to work, and that wouldn't be slow as hell. Let's take an example:
Order order = em.find(Order.class, 3L);
Customer customer = em.find(Customer.class, 5L);
for (Order o : customer.getOrders()) { // line A
if (o.getId().longValue == 3L) {
o.setComment("hello"); // line B
o.setModifier("John");
}
}
System.out.println(order.getComment)); // line C
for (Order o : customer.getOrders()) { // line D
System.out.println(o.getComment()); // line E
}
At line A, JPA executes a SQL query to load all the orders of the customer.
At line C, what do you expect to be printed? null or "hello"? You expect "hello" to be printed, because the order you modified at line B has the same ID as the one loaded in the first line. That wouldn't be possible without the first-level cache.
At line D, you don't expect the orders to be loaded again from the database, because they have already been loaded at line A. That wouldn't be possible without the first-level cache.
At line E, you expect once again "hello" to be printed for the order 3. That wouldn't be possible without the first-level cache.
At line B, you don't expect an update query to be executed, because there might be many subsequent modifications (like in the next line), to the same entity. So you expect these modifications to be written to the database as late as possible, all in one go, at the end of the transaction. That wouldn't be possible without the first-level cache.

The first level cache serves other purposes. It is basically the context in which JPA places the entities retrieved from the database.
Performance
So, to start stating the obvious, it avoids having to retrieve a record when it has already being retrieved serving as some form of cache during a transaction processing and improving performance. Also, think about lazy loading. How could you implement it without a cache to record entities that have already being lazy loaded?
Cyclic Relationships
This caching purpose is vital to the implementation of appropriate ORM frameworks. In object-oriented languages it is common that the object graph has cyclic relationships. For instance, a Department that has Employee objects and those Employee objects belong to a Department.
Without a context (aka as Unit of Work) it would be difficult to keep track of which records you have already ORMed and you would end up creating new objects, and in a case like this, you may even end up in an infinite loop.
Keep Track of Changes: Commit and Rollback
Also, this context keeps track of the changes you do to the objects so that they can be persisted or rolled back at some later point when the transaction ends. Without a cache like this you would be forced to flush your changes to the database immediately as they happen and then you could not rollback, neither could you optimize the best moment to flush them to the store.
Object Identity
Object identity is also vital in ORM frameworks. That is if you retrieve employee ID 123, then if at some time you need that Employee, you should always get the same object, and not some new Object containing the same data.
This type of cache is not to be shared by multiple threads, if it was so, you would compromise performance and force everyone to pay that penalty even when they could be just fine with a single-threaded solution. Besides the fact that you would end up with a much more complex solution that would be like killing a fly with a bazooka.
That is the reason why if what you need is a shared cache, then you actually need a 2nd-level cache, and there are implementations for that as well.

Related

Concurrency with Hibernate in Spring

I found a lot of posts regarding this topic, but all answers were just links to documentations with no example code, i.e., how to use concurrency in practice.
My situation: I have an entity House with (for simplyfication) two attributes, number (the id) and owner. The database is initialized with 10 Houses with number 1-10 and owner always null.
I want to assign a new owner to the house with currently no owner, and the smallest number. My code looks like this:
#Transactional
void assignNewOwner(String newOwner) {
//this is flagged as #Transactional too
House tmp = houseDao.getHouseWithoutOwnerAndSmallestNumber();
tmp.setOwner(newOwner);
//this is flagged as #Transactional too
houseDao.update(tmp);
}
For my understanding, although the #Transactional is used, the same House could be assigned twice to different owners, if two requests fetch the same empty House as tmp. How do I ensure this can not happen?
I know, including the update in the selection of the empty House would solve the issue, but in near future, I want to modify/work with the tmp object more.
Optimistic
If you add a version column to your entity / table then you could take advantage of a mechanism called Optimistic Locking. This is the most proficient way of making sure that the state of an entity has not changed since we obtained it in a transactional context.
Once you createQuery using the session you can then call setLockMode(LockModeType.OPTIMISTIC);
Then, just before the transaction is commited, the persistence provider would query for the current version of that entity and check whether it has been incremented by another transaction. If so, you would get an OptimisticLockException and a transaction rollback.
Pessimistic
If you do not version your rows, then you are left with pessimistic lockin which basically means that you phycically create a lock for queries entities on the database level and other transactions cannot read / update those certain rows.
You achieve that by setting this on the Query object:
setLockMode(LockModeType.PESSIMISTIC_READ);
or
setLockMode(LockModeType.PESSIMISTIC_WRITE);
Actually it's pretty easy - at least in my opinion and I am going to abstract away of what Hibernate will generate when you say Pessimistic/Optimistic. You might think this is SELECT FOR UPDATE - but it's not always the case, MSSQL AFAIK does not have that...
These are JPA annotations and they guarantee some functionality, not the implementation.
Fundamentally they are entire different things - PESSIMISTIC vs OPTIMISTIC locking. When you do a pessimistic locking you sort of do a synchronized block at least logically - you can do whatever you want and you are safe within the scope of the transaction. Now, whatever the lock is being held for the row, table or even page is un-specified; so a bit dangerous. Usually database may escalate locks, MSSQL does that if I re-call correctly.
Obviously lock starvation is an issue, so you might think that OPTIMISTIC locking would help. As a side note, this is what transactional memory is in modern CPU; they use the same thinking process.
So optimistically locking is like saying - I will mark this row with an ID/Date, etc, then I will take a snapshot of that and work with it - before committing I will check if that Id has a changed. Obviously there is contention on that ID, but not on the data. If it has changed - abort (aka throw OptimisticLockException) otherwise commit the work.
The thing that bothers everyone IMO is that OptimisticLockException - how do you recover from that? And here is something you are not going to like - it depends. There are apps where a simple retry would be enough, there are apps where this would be impossible. I have used it in rare scenarios.
I usually go with Pessimistic locking (unless Optimistic is totally not an option). At the same time I would look of what hibernate generates for that query. For example you might need an index on how the entry is retrieved for the DB to actually lock just the row - because ultimately that is what you would want.

Spring JPA caching - should I avoid retrieving the same resource from repository a few times?

I have a following line in my code:
String productName = Utils.getProductName(productId, productRepository, language);
This util method retrieves the product using method findOne(productId), but has some additional logic well. It is used in multiple places in my code.
In one place, a few lines lower, I need the Product object, so I do following:
Product product = productRepository.findOne(productId);
Here I retrieve the Product again, so we have the same action on the database again. But I believe that JPA (Hibernate) caches the object so I don't have to worry about it, the performance won't be affected. Am I right?
Of course, I try to avoid such a duplicity. But in this case refactoring getProductName method would have an impact on other places where I use this method. So I'd like to just leave it as it is. But if the performance cost would be noticeable, I'd better tweak the code.
Yes, there is a first level cache enabled on the entity manager. "In first level cache CRUD operations are performed per transaction basis to reduce the number of queries fired to the database."
http://www.developer.com/java/using-second-level-caching-in-a-jpa-application.html
Just be sure not to create "inconsistent" states without informing the entity manager or flush the changes to the DB.

Should Hibernate Session#merge do an insert when receiving an entity with an ID?

This seems like it would come up often, but I've Googled to no avail.
Suppose you have a Hibernate entity User. You have one User in your DB with id 1.
You have two threads running, A and B. They do the following:
A gets user 1 and closes its Session
B gets user 1 and deletes it
A changes a field on user 1
A gets a new Session and merges user 1
All my testing indicates that the merge attempts to find user 1 in the DB (it can't, obviously), so it inserts a new user with id 2.
My expectation, on the other hand, would be that Hibernate would see that the user being merged was not new (because it has an ID). It would try to find the user in the DB, which would fail, so it would not attempt an insert or an update. Ideally it would throw some kind of concurrency exception.
Note that I am using optimistic locking through #Version, and that does not help matters.
So, questions:
Is my observed Hibernate behaviour the intended behaviour?
If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
If the answer to 2. is yes, why is nobody complaining about it?
Please see the text from hibernate documentation below.
Copy the state of the given object onto the persistent object with the same identifier. If there is no persistent instance currently associated with the session, it will be loaded. Return the persistent instance. If the given instance is unsaved, save a copy of and return it as a newly persistent instance.
It clearly stated that copy the state(data) of object in database. if object is not there then save a copy of that data. When we say save a copy hibernate always create a record with new identifier.
Hibernate merge function works something like as follows.
It checks the status(attached or detached to the session) of entity and found it detached.
Then it tries to load the entity with identifier but not found in database.
As entity is not found then it treat that entity as transient.
Transient entity always create a new database record with new identifier.
Locking is always applied to attached entities. If entity is detached then hibernate will always load it and version value gets updated.
Locking is used to control concurrency problems. It is not the concurrency issue.
I've been looking at JSR-220, from which Session#merge claims to get its semantics. The JSR is sadly ambiguous, I have found.
It does say:
Optimistic locking is a technique that is used to insure that updates
to the database data corresponding to the state of an entity are made
only when no intervening transaction has updated that data since the
entity state was read.
If you take "updates" to include general mutation of the database data, including deletes, and not just a SQL UPDATE, which I do, I think you can make an argument that the observed behaviour is not compliant with optimistic locking.
Many people agree, given the comments on my question and the subsequent discovery of this bug.
From a purely practical point of view, the behaviour, compliant or not, could lead to quite a few bugs, because it is contrary to many developers' expectations. There does not seem to be an easy fix for it. In fact, Spring Data JPA seems to ignore this issue completely by blindly using EM#merge. Maybe other JPA providers handle this differently, but with Hibernate this could cause issues.
I'm actually working around this by using Session#update currently. It's really ugly, and requires code to handle the case when you try to update an entity that is detached, and there's a managed copy of it already. But, it won't lead to spurious inserts either.
1.Is my observed Hibernate behaviour the intended behaviour?
The behavior is correct. You just trying to do operations that are not protected against concurrent data modification :) If you have to split the operation into two sessions. Just find the object for update again and check if it is still there, throw exception if not. If there is one then lock it by using em.(class, primary key, LockModeType); or using #Version or #Entity(optimisticLock=OptimisticLockType.ALL/DIRTY/VERSION) to protect the object till the end of the transaction.
2.If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
Probably: yes
3.If the answer to 2. is yes, why is nobody complaining about it?
Because if you protect your operations using pessimistic or optimistic locking the problem will disappear:)
The problem you are trying to solve is called: Non-repeatable read

Parallel updates to different entity properties

I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.

How complex can a Hibernate transaction be

This is probably more a "design" or style question: I have just been considering how complex a Hibernate transaction should or could be. I am working with an application that persists messages to a database using Hibernate.
Building the message POJO involves factoring out one-to-many relationships from the message into their respective persistent objects. For example the message contains a "city" field. The city is extracted from the message, the database searched for an equivalent city object and the resulting object added to the message POJO. All of this is done within a single transaction:
Start transaction
test for duplicate
retrieve city object
setCity(cityObject) in message object
retreive country object
setCountry(countryObject) in message object
persist message object
commit End transaction
In fact the actual transactions are considerably more complex. Is this a reasonable structure or should each task be completed within a single transaction (rather than all tasks in one transaction)? I guess the second question relates to best practice in designing the tasks within a transaction. I understand that some tasks need to be grouped for referential integrity, however this is not always the case.
Whatever you put within your outer transaction boundary, the question is whether you can successfully roll back each action.
Bundle related actions within a boundary, and keep it as simple as possible.
Transactions should be grouped according to the business requirements, not technical complexity. If you have N operations that must succeed or fail together as a unit, then that's what the code should support. It should be more of a business consideration than a technical one.
Multiple transactions only make sense if the DB isn't left in a stupid state between them, because any single transaction could fail. Nested transactions may make sense if any block of activity must be atomic and the entire transaction depends on any of the other atomic units.

Categories