1) When fetching entities from hibernate i always close the session after fetching, and often i need to fetch same entities but at different times (different sessions)
And then i need to perform some operations on the fetched entities, and there i get some problems when it comes to updating, since i perform different operations on different entities (which are exactly the same in the database)
Is there any good pratices to avoid such problem ?
2)- When updating entities from a software which is working in a network, often 2 different computers does some different operations on same entities (same in the database), but when updating, everything will be corrupted.
For example, let consider the fact of updating the quantity of a product after a sale. After a sale the quantity of the product should be less than it was, but once 2 different computers does a sale on pre-fetched product, they will surely be a wrong value in the database as i'm updating the product using jpa update() function.
Is there any good pratices also for such problems ?
Thanks and sorry if it's too abstract and unclear.
On 1), you shouldn't to that without expecting concurrency problems on updating the entity informations.
On 2), you can use two strategies to deal with: Optimistic Lock or Pessimistic Lock. The Hibernate support both strategies.
There is no right answer for your question. But if would like only to prevent the problem, the Optimistic Lock seems a good solution. When two computers try to update the field, the last to write the quantity on database will receive an error and the transaction will rollback.
I ended up by updating entities using native HQL queries (for the ones who would face the same issue) then fetching again the entities (peforming a select query again) on update finished.
Related
I am using Spring Boot and Hibernate.
Some complex logic, dictated by business, needs to use various nested fields, which traverse various DB relationships (again, some are NxN, Nx1, 1xN, 1x1).
I encountered the N+1 problem, and I solved it at first with HQL, but some queries need several joins and the result sets become unmanageable.
I started working on a custom utility that collects the ids of things that need to be fetched, fetches them all at once and uses the setters to then populate the fields on the starting objects. This utility works for ManyToOne relationships, but is still inefficient with ManyToMany relationships, because it falls back in the N+1 problem when I collect the ids (as it queries the join table once per object via the getter).
How can I solve this? Has this problem really not been solved yet? Am I missing some obvious settings that solves this automagically?
EDIT:
I made a toy example with some commentary: https://github.com/marcotama/n-1-queries-example
I had faced the same situation and I had 3 ways to solve it;
increase the fetchsize for the dependent attribute so that the queries are executed in batch
write a custom query for the purpose
define entity graph relations and map accordingly to attributes
I personally preferred the 3rd option as it was convenient to do that and was cleaner with spring data JPA.
you can refer to examples from the comments from the below answers:
Spring Data JPA And NamedEntityGraphs
What is the solution for the N+1 issue in JPA and Hibernate?
Write fetch logic on your own.
E.g You have author which has book, author_devices
You can join fetch author with books. Than you can separatly fetch author_devices using repository "where author_id IN (authorsList.stream().map(author.getId())". Than you should detach author and iterate author_devices and assign it to apropriate author devices list. I think it's only adequate solution for situations where you need to join-fetch more than 1 relation.
My project has recently discovered that Hibernate can take multiple levels of relationship and eager fetch them in a single join HQL to produce the filled object we need. We love this feature, figuring it would outperform a lazy fetch circumstance.
Problem is, we hit a situation where a single parent has about a dozen direct relationships, an a few subrelationships off of that, and a few of them have several dozen rows in a few instances. The result is a pretty large cross-product that results in the hql spinning it's wheels virtually forever. We turned logging up to 11 and saw more than 100000 iterations before we gave up and killed it.
So clearly, while this technique is great for some situations, it has limits like everything in life. But what is the best performing alternative in hibernate for this? We don't want to lazy-load these, because we'll get into an N+1 situation that will be even worse.
I'd ideally like to have Hibernate pre-fetch all the rows and details, but do it one relationship at a time, and then hydrate the right detail object to the right parent, but I have no idea if it does such a thing.
Suggestions?
UPDATE:
So we got the SQL this query generated, it turns out that I misdiagnosed the problem. The cross product is NOT that huge. We ran the same query in our database directly and got 500 rows returned in just over a second.
Yet we saw very clearly in the hibernate logging it making 100K iterations. Is it possible Hibernate can get caught in a loop in your relationships or something?
Or maybe this should be asked as a new question?
Our team uses the special strategy to work with associations. Collections are lazy, single relations are lazy too, except references with simply structure (for an example a countries reference). And we use fluent-hibernate to load what we need in a concrete situation. It is simply because of fluent-hibernate supports nested projections. You can refer this unit test to see how complex object net can be partially loaded. A code snippet from the unit test
List<Root> roots = H.<Root> request(Root.class).proj(Root.ROOT_NAME)
.innerJoin("stationarFrom.stationar", "stationar")
.proj("stationar.name", "stationarFrom.stationar.name")
.eq(Root.ROOT_NAME, rootName).transform(Root.class).list();
See also
How to transform a flat result set using Hibernate
Sorry if the answer was already response, but I tried to find out the solution and I could not find anything clear yet.
My question is, there´s any relationships between LAZY fetch relationships, which I know they have the control to know if they have to JOIN or not to other entities or initialize from database if they´re dirty on session, with the principle of optimistic locking VERSION?.
As far as I can understand the optimistic locking VERSION is more necessary if we have to handle multiple transactions at the same time. Is that correct?.
If all the transactions that we do in our applications are done sequentially, it is enough use LAZY fetch to manage when JOINS have to being done?. Or add version give us any add of value.
Thanks!
They are two completely disparate concepts. You only hit a "lazy load" if you get or set a lazily loaded relationship.
If you're using optimistic locking and your row is on V2, it just prevents submission of a modified V2 from one client (which results in the version being upped to 3) and then a different V2 from another client, forcing them to reload the data and submit a later version.
If your logic hits the lazily loaded relationships which then hit thousands of other relationships and ends up loading millions of rows, you will have a performance problem, not a versioning one. In which case you may need to up your batch sizes or maybe do some fetch joins to ensure whatever it is you want is loaded in one block rather than thousands of sequential SQL queries.
So, different problem spaces entirely.
If you're trying to update a very complex object graph, where your alterations go deep into that graph, you may hit interesting optimistic locking problems as ensuring an entire tree's "version" is the same is difficult.
UPDATE: For clarification.
If you have a Car (which has singular properties such as make, model, registration number) and Wheels in multiple. This would be a 1:0..n relationship, represented as two tables, a Car table and a Wheel table, where the Wheel has an FK back to Car. (For the purposes of this post, we will ignore many-to-many relationships).
If you lazy load your wheels, then unless you're interested if you have rim spinners, tyres, locking nuts etc, you never need load the Wheel records in, they're not relevant if you only need the registration number.
Your Car record is on V1, it has a registration number of AB1212
If I, as the Vehicle registrar of Moldova update it to AC4545 and submit with V1 (the current version), I will succeed and the version number will be incremented. I will not hit the Wheels unless I need to. If at the same time, the Subaltern Vehicle registrar in the other room tries to do the same thing on V1, it will fail with an StaleObjectException, again, not hitting the Wheels and thus not invoking a lazy load.
The lazy fetch proxy will throw a LazyInitializationException if it tries to fetch data that was altered by a different transaction(if that happens) with optimistic locking.
It's hard to help without any code or a good question, but as long as you keep all your initializations within a #Transactional code block, you should encounter much trouble.
That being said, you are trying to compare two (functionally) very different things...
Hope this helps.
We are using Hibernate 3.6.0.Final with JPA 2 and Spring 3.0.5 for a large scale enterprise application running on tomcat 7 and MySQL 5.5. Most of the transactions in application, lives for less than a second and update 5-10 entities but in some use cases we need to update more than 10-20K entities in single transaction, which takes few minutes and hence more than 70% of times such transaction fails with StaleObjectStateException because some of those entities got updated by some other transaction.
We generally maintain version column in all tables and in case of StaleObjectStateException we generally retry but since these longs transactions are anyways very long so if we keep on retrying then also I am not very sure that we'll be able to escape StaleObjectStateException.
Also lot of activities keep updating these entities in busy hours so we cannot go with pessimistic approach because it can potentially halt many activities in system.
Please suggest how to fix such long transaction issue because we cannot spawn thousands of independent and small transactions because we cannot afford messed up data in case of some failed & some successful transactions.
Modifying 20,000 entities in one transaction is really a lot, much more than normal.
I can't give you a general solution, but here are some ideas how to solve the problem.
1) Use LockMode.UPGRADE (see pessimistic locking). There you explicitly generate a "SELECT FOR UPDATE", which stops other users to modify the rows while they are locked.
This should avoid your problem, but if you have too many large transactions it can produce deadlocks (depending of your programming) or timeouts.
2) Change your data model to avoid these large transactions. Why do you have to update 10,000 rows? Perhaps it is possible to put this information, which is updated in so many rows, into a new table and let it be referenced only, so you have to update only a few rows in the new table.
3) Use StatelessSession instead of Session. In this case you are not forced to rollback after an exception, instead you can correct the problem and continue (in your case reload the entity which was modified in meantime and do the modifcation for the large transaction on the reloaded entity). This perhaps give you the possibility to handle the critical event (row modified in meantime) on a row to row basis instead for the complete large transaction.
I was asked to have a look at a legacy EJB3 application with significant performance problems. The original author is not available anymore so all I've got is the source code and some user comments regarding the unacceptable performance. My personal EJB3 skill are pretty basic, I can read and understand the annotated code but that's all until know.
The server has a database, several EJB3 beans (JPA) and a few stateless beans just to allow CRUD on 4..5 domain objects for remote clients. The client itself is a java application. Just a few are connected to the server in parallel. From the user comments I learned that
the client/server app performed well in a LAN
the app was practically unusable on a WAN (1MBit or more) because read and update operations took much too long (up to several minutes)
I've seen one potential problem - on all EJB, all relations have been defined with the fetching strategy FetchType.EAGER. Would that explain the performance issues for read operations, is it advisable to start tuning with the fetching strategies?
But that would not explain performance issues on update operations, or would it? Update is handled by an EntityManager, the client just passes the domain object to the manager bean and persisting is done with nothing but manager.persist(obj). Maybe the domain objects that are sent to the server are just too big (maybe a side effect of the EAGER strategy).
So my actual theory is that too many bytes are sent over a rather slow network and I should look at reducing the size of result sets.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
On all EJB, all relations have been defined with the fetching strategy FetchType.EAGER. Would that explain the performance issues for read operations?
Depending on the relations betweens classes, you might be fetching much more (the whole database?) than actually wanted when retrieving entities?
is it advisable to start tuning with the fetching strategies?
I can't say that making all relations EAGER is a very standard approach. To my experience, you usually keep them lazy and use "Fetch Joins" (a type of join allowing to fetch an association) when you want to eager load an association for a given use case.
But that would not explain performance issues on update operations, or would it?
It could. I mean, if the app is retrieving a big fat object graph when reading and then sending the same fat object graph back to update just the root entity, there might be a performance penalty. But it's kinda weird that the code is using em.persist(Object) to update entities.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
The obvious ones include:
Retrieving more data than required
N+1 requests problems (bad fetching strategy)
Poorly written JPQL queries
Non appropriate inheritance strategies
Unnecessary database hits (i.e. lack of caching)
I would start with writing some integration tests or functional tests before touching anything to guarantee you won't change the functional behavior. Then, I would activate SQL logging and start to look at the generated SQL for the major use cases and work on the above points.
From DBA position.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
Turn off caching
Enable sql logging Ejb3/Hibernate generates by default a lots of extremely stupid queries.
Now You see what I mean.
Change FetchType.EAGER to FetchType.LAZY
Say "no" for big business logic between em.find em.persist
Use ehcache http://ehcache.org/
Turn on entity cache
If You can, make primary keys immutable ( #Column(updatable = false, ...)
Turn on query cache
Never ever use Hibernate if You want big performance:
http://www.google.com/search?q=hibernate+sucks
I my case a similar performance problem wasn't depending on the fetch strategy. Or lets say it was not really possible to change the business logic in the existing fetch strategies. In my case the solution was simply adding indices.
When your JPA Object model have a lot of relationsships (OneToOne, OneToMany, ...) you will typical use JPQL statements with a lot of joins. This can result in complex SQL translations. When you take a look at the datamodel (generated by the JPA) you will recognize that there are no indices for any of your table rows.
For example if you have a Customer and a Address object with an oneToOne relationship everything will work well on the first look. Customer and Address have an foreign key. But if you do selections like this
Select c from Customer as c where c.address.zip='8888'
you should take care about your table column 'zip' in the table ADDRESS. JPA will not create such an index for you during deployment. So in my case I was able to speed up the database performance by simply adding indices.
An SQL Statement in your database looks like this:
ALTER TABLE `mydatabase`.`ADDRESS` ADD INDEX `zip_index`(`IZIP`);
In the question, and in the other answers, I'm hearing a lot of "might"s and "maybe"s.
First find out what's going on. If you haven't done that, we're all just poking in the dark.
I'm no expert on this kind of system, but this method works on any language or OS.
When you find out what's making it take too long, why don't you summarize it here?
I'm especially interested to know if it was something that might have been guessed.