Hibernate LAZY Fetch VS Optimistic Locking version - java

Sorry if the answer was already response, but I tried to find out the solution and I could not find anything clear yet.
My question is, there´s any relationships between LAZY fetch relationships, which I know they have the control to know if they have to JOIN or not to other entities or initialize from database if they´re dirty on session, with the principle of optimistic locking VERSION?.
As far as I can understand the optimistic locking VERSION is more necessary if we have to handle multiple transactions at the same time. Is that correct?.
If all the transactions that we do in our applications are done sequentially, it is enough use LAZY fetch to manage when JOINS have to being done?. Or add version give us any add of value.
Thanks!

They are two completely disparate concepts. You only hit a "lazy load" if you get or set a lazily loaded relationship.
If you're using optimistic locking and your row is on V2, it just prevents submission of a modified V2 from one client (which results in the version being upped to 3) and then a different V2 from another client, forcing them to reload the data and submit a later version.
If your logic hits the lazily loaded relationships which then hit thousands of other relationships and ends up loading millions of rows, you will have a performance problem, not a versioning one. In which case you may need to up your batch sizes or maybe do some fetch joins to ensure whatever it is you want is loaded in one block rather than thousands of sequential SQL queries.
So, different problem spaces entirely.
If you're trying to update a very complex object graph, where your alterations go deep into that graph, you may hit interesting optimistic locking problems as ensuring an entire tree's "version" is the same is difficult.
UPDATE: For clarification.
If you have a Car (which has singular properties such as make, model, registration number) and Wheels in multiple. This would be a 1:0..n relationship, represented as two tables, a Car table and a Wheel table, where the Wheel has an FK back to Car. (For the purposes of this post, we will ignore many-to-many relationships).
If you lazy load your wheels, then unless you're interested if you have rim spinners, tyres, locking nuts etc, you never need load the Wheel records in, they're not relevant if you only need the registration number.
Your Car record is on V1, it has a registration number of AB1212
If I, as the Vehicle registrar of Moldova update it to AC4545 and submit with V1 (the current version), I will succeed and the version number will be incremented. I will not hit the Wheels unless I need to. If at the same time, the Subaltern Vehicle registrar in the other room tries to do the same thing on V1, it will fail with an StaleObjectException, again, not hitting the Wheels and thus not invoking a lazy load.

The lazy fetch proxy will throw a LazyInitializationException if it tries to fetch data that was altered by a different transaction(if that happens) with optimistic locking.
It's hard to help without any code or a good question, but as long as you keep all your initializations within a #Transactional code block, you should encounter much trouble.
That being said, you are trying to compare two (functionally) very different things...
Hope this helps.

Related

Hibernate best pratices for fetching entities

1) When fetching entities from hibernate i always close the session after fetching, and often i need to fetch same entities but at different times (different sessions)
And then i need to perform some operations on the fetched entities, and there i get some problems when it comes to updating, since i perform different operations on different entities (which are exactly the same in the database)
Is there any good pratices to avoid such problem ?
2)- When updating entities from a software which is working in a network, often 2 different computers does some different operations on same entities (same in the database), but when updating, everything will be corrupted.
For example, let consider the fact of updating the quantity of a product after a sale. After a sale the quantity of the product should be less than it was, but once 2 different computers does a sale on pre-fetched product, they will surely be a wrong value in the database as i'm updating the product using jpa update() function.
Is there any good pratices also for such problems ?
Thanks and sorry if it's too abstract and unclear.
On 1), you shouldn't to that without expecting concurrency problems on updating the entity informations.
On 2), you can use two strategies to deal with: Optimistic Lock or Pessimistic Lock. The Hibernate support both strategies.
There is no right answer for your question. But if would like only to prevent the problem, the Optimistic Lock seems a good solution. When two computers try to update the field, the last to write the quantity on database will receive an error and the transaction will rollback.
I ended up by updating entities using native HQL queries (for the ones who would face the same issue) then fetching again the entities (peforming a select query again) on update finished.

Concurrency with Hibernate in Spring

I found a lot of posts regarding this topic, but all answers were just links to documentations with no example code, i.e., how to use concurrency in practice.
My situation: I have an entity House with (for simplyfication) two attributes, number (the id) and owner. The database is initialized with 10 Houses with number 1-10 and owner always null.
I want to assign a new owner to the house with currently no owner, and the smallest number. My code looks like this:
#Transactional
void assignNewOwner(String newOwner) {
//this is flagged as #Transactional too
House tmp = houseDao.getHouseWithoutOwnerAndSmallestNumber();
tmp.setOwner(newOwner);
//this is flagged as #Transactional too
houseDao.update(tmp);
}
For my understanding, although the #Transactional is used, the same House could be assigned twice to different owners, if two requests fetch the same empty House as tmp. How do I ensure this can not happen?
I know, including the update in the selection of the empty House would solve the issue, but in near future, I want to modify/work with the tmp object more.
Optimistic
If you add a version column to your entity / table then you could take advantage of a mechanism called Optimistic Locking. This is the most proficient way of making sure that the state of an entity has not changed since we obtained it in a transactional context.
Once you createQuery using the session you can then call setLockMode(LockModeType.OPTIMISTIC);
Then, just before the transaction is commited, the persistence provider would query for the current version of that entity and check whether it has been incremented by another transaction. If so, you would get an OptimisticLockException and a transaction rollback.
Pessimistic
If you do not version your rows, then you are left with pessimistic lockin which basically means that you phycically create a lock for queries entities on the database level and other transactions cannot read / update those certain rows.
You achieve that by setting this on the Query object:
setLockMode(LockModeType.PESSIMISTIC_READ);
or
setLockMode(LockModeType.PESSIMISTIC_WRITE);
Actually it's pretty easy - at least in my opinion and I am going to abstract away of what Hibernate will generate when you say Pessimistic/Optimistic. You might think this is SELECT FOR UPDATE - but it's not always the case, MSSQL AFAIK does not have that...
These are JPA annotations and they guarantee some functionality, not the implementation.
Fundamentally they are entire different things - PESSIMISTIC vs OPTIMISTIC locking. When you do a pessimistic locking you sort of do a synchronized block at least logically - you can do whatever you want and you are safe within the scope of the transaction. Now, whatever the lock is being held for the row, table or even page is un-specified; so a bit dangerous. Usually database may escalate locks, MSSQL does that if I re-call correctly.
Obviously lock starvation is an issue, so you might think that OPTIMISTIC locking would help. As a side note, this is what transactional memory is in modern CPU; they use the same thinking process.
So optimistically locking is like saying - I will mark this row with an ID/Date, etc, then I will take a snapshot of that and work with it - before committing I will check if that Id has a changed. Obviously there is contention on that ID, but not on the data. If it has changed - abort (aka throw OptimisticLockException) otherwise commit the work.
The thing that bothers everyone IMO is that OptimisticLockException - how do you recover from that? And here is something you are not going to like - it depends. There are apps where a simple retry would be enough, there are apps where this would be impossible. I have used it in rare scenarios.
I usually go with Pessimistic locking (unless Optimistic is totally not an option). At the same time I would look of what hibernate generates for that query. For example you might need an index on how the entry is retrieved for the DB to actually lock just the row - because ultimately that is what you would want.

HQL joined query to eager fetch a large number of relationships

My project has recently discovered that Hibernate can take multiple levels of relationship and eager fetch them in a single join HQL to produce the filled object we need. We love this feature, figuring it would outperform a lazy fetch circumstance.
Problem is, we hit a situation where a single parent has about a dozen direct relationships, an a few subrelationships off of that, and a few of them have several dozen rows in a few instances. The result is a pretty large cross-product that results in the hql spinning it's wheels virtually forever. We turned logging up to 11 and saw more than 100000 iterations before we gave up and killed it.
So clearly, while this technique is great for some situations, it has limits like everything in life. But what is the best performing alternative in hibernate for this? We don't want to lazy-load these, because we'll get into an N+1 situation that will be even worse.
I'd ideally like to have Hibernate pre-fetch all the rows and details, but do it one relationship at a time, and then hydrate the right detail object to the right parent, but I have no idea if it does such a thing.
Suggestions?
UPDATE:
So we got the SQL this query generated, it turns out that I misdiagnosed the problem. The cross product is NOT that huge. We ran the same query in our database directly and got 500 rows returned in just over a second.
Yet we saw very clearly in the hibernate logging it making 100K iterations. Is it possible Hibernate can get caught in a loop in your relationships or something?
Or maybe this should be asked as a new question?
Our team uses the special strategy to work with associations. Collections are lazy, single relations are lazy too, except references with simply structure (for an example a countries reference). And we use fluent-hibernate to load what we need in a concrete situation. It is simply because of fluent-hibernate supports nested projections. You can refer this unit test to see how complex object net can be partially loaded. A code snippet from the unit test
List<Root> roots = H.<Root> request(Root.class).proj(Root.ROOT_NAME)
.innerJoin("stationarFrom.stationar", "stationar")
.proj("stationar.name", "stationarFrom.stationar.name")
.eq(Root.ROOT_NAME, rootName).transform(Root.class).list();
See also
How to transform a flat result set using Hibernate

Parallel updates to different entity properties

I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.

Strategies for performance optimizations on an inherited EJB3 application

I was asked to have a look at a legacy EJB3 application with significant performance problems. The original author is not available anymore so all I've got is the source code and some user comments regarding the unacceptable performance. My personal EJB3 skill are pretty basic, I can read and understand the annotated code but that's all until know.
The server has a database, several EJB3 beans (JPA) and a few stateless beans just to allow CRUD on 4..5 domain objects for remote clients. The client itself is a java application. Just a few are connected to the server in parallel. From the user comments I learned that
the client/server app performed well in a LAN
the app was practically unusable on a WAN (1MBit or more) because read and update operations took much too long (up to several minutes)
I've seen one potential problem - on all EJB, all relations have been defined with the fetching strategy FetchType.EAGER. Would that explain the performance issues for read operations, is it advisable to start tuning with the fetching strategies?
But that would not explain performance issues on update operations, or would it? Update is handled by an EntityManager, the client just passes the domain object to the manager bean and persisting is done with nothing but manager.persist(obj). Maybe the domain objects that are sent to the server are just too big (maybe a side effect of the EAGER strategy).
So my actual theory is that too many bytes are sent over a rather slow network and I should look at reducing the size of result sets.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
On all EJB, all relations have been defined with the fetching strategy FetchType.EAGER. Would that explain the performance issues for read operations?
Depending on the relations betweens classes, you might be fetching much more (the whole database?) than actually wanted when retrieving entities?
is it advisable to start tuning with the fetching strategies?
I can't say that making all relations EAGER is a very standard approach. To my experience, you usually keep them lazy and use "Fetch Joins" (a type of join allowing to fetch an association) when you want to eager load an association for a given use case.
But that would not explain performance issues on update operations, or would it?
It could. I mean, if the app is retrieving a big fat object graph when reading and then sending the same fat object graph back to update just the root entity, there might be a performance penalty. But it's kinda weird that the code is using em.persist(Object) to update entities.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
The obvious ones include:
Retrieving more data than required
N+1 requests problems (bad fetching strategy)
Poorly written JPQL queries
Non appropriate inheritance strategies
Unnecessary database hits (i.e. lack of caching)
I would start with writing some integration tests or functional tests before touching anything to guarantee you won't change the functional behavior. Then, I would activate SQL logging and start to look at the generated SQL for the major use cases and work on the above points.
From DBA position.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
Turn off caching
Enable sql logging Ejb3/Hibernate generates by default a lots of extremely stupid queries.
Now You see what I mean.
Change FetchType.EAGER to FetchType.LAZY
Say "no" for big business logic between em.find em.persist
Use ehcache http://ehcache.org/
Turn on entity cache
If You can, make primary keys immutable ( #Column(updatable = false, ...)
Turn on query cache
Never ever use Hibernate if You want big performance:
http://www.google.com/search?q=hibernate+sucks
I my case a similar performance problem wasn't depending on the fetch strategy. Or lets say it was not really possible to change the business logic in the existing fetch strategies. In my case the solution was simply adding indices.
When your JPA Object model have a lot of relationsships (OneToOne, OneToMany, ...) you will typical use JPQL statements with a lot of joins. This can result in complex SQL translations. When you take a look at the datamodel (generated by the JPA) you will recognize that there are no indices for any of your table rows.
For example if you have a Customer and a Address object with an oneToOne relationship everything will work well on the first look. Customer and Address have an foreign key. But if you do selections like this
Select c from Customer as c where c.address.zip='8888'
you should take care about your table column 'zip' in the table ADDRESS. JPA will not create such an index for you during deployment. So in my case I was able to speed up the database performance by simply adding indices.
An SQL Statement in your database looks like this:
ALTER TABLE `mydatabase`.`ADDRESS` ADD INDEX `zip_index`(`IZIP`);
In the question, and in the other answers, I'm hearing a lot of "might"s and "maybe"s.
First find out what's going on. If you haven't done that, we're all just poking in the dark.
I'm no expert on this kind of system, but this method works on any language or OS.
When you find out what's making it take too long, why don't you summarize it here?
I'm especially interested to know if it was something that might have been guessed.

Categories