How to speed up List access in JPA

How to speed up List access in JPA - java

In my mapper class, I always have to retrieve a list of departments from my company entity :
#OneToMany(mappedBy = "company", orphanRemoval = true, cascade = CascadeType.ALL)
public List<CompanyDepartments> getDepartments()
{
return departments; //java.util.List variable.
}
There are a couple thousand departments(about 2000+). I am trying to reduce the time taken to fetch company data and want to begin with the departments which are going to be fairly static in nature.
Option 1 : I can always have a private method in the mapper which populates a cache on the first load and return from the cache all the time.
But is there anything more simpler? I am probably not the first one to face this and wanted to know how I could approach this. Is there a known annotation that can be used. Like a #Singleton annotation on the variable in the entity ? (there is no such annotation obviously). What is the simplest way to make this list a singleton.
Just a typical spring mvc 3 application using spring data jpa for db interaction.

I suggest this link that solves the n +1 query problem ...
How can i resolve the N+1 Selects problem?
Moreover you could put a cache (lvl 2) on your search service,
if the data does not change during the life cycle of the application.
http://docs.spring.io/spring/docs/3.1.0.M1/spring-framework-reference/html/cache.html
Another approach is to add to indexes on the db.
I hope I've given you all the answers about your question.

What you're experiencing is part of the "object-relational impedance mismatch". One solution would be to extend your object model so that you can use a left join fetch to load multiple companies including their departments using only one SQL statement.
The problem using this technique is that you can populate only one list at a time. Although it's nice & easy to define many sub-lists in the world of objects, it's hard to load all these lists at the same time (and therefor efficiently) from a relational database. However it can be done on Oracle using object- or XMLType queries.
The straight-forward way to get such data out of a RDBMS is by writing custom queries that match exactly the task at hand and provide only the data that is actually needed. Therefor you'd need not only one company class, but many - one for each task. - you'd actually do inheritance instead of attribution
BTW, that is why ORM is still considered the Vietnam of Computer science - easy to get started, but hard to succeed with.

Related

In Hibernate, how to avoid the N+1 query problem and large results sets due to multiple joins

I am using Spring Boot and Hibernate.
Some complex logic, dictated by business, needs to use various nested fields, which traverse various DB relationships (again, some are NxN, Nx1, 1xN, 1x1).
I encountered the N+1 problem, and I solved it at first with HQL, but some queries need several joins and the result sets become unmanageable.
I started working on a custom utility that collects the ids of things that need to be fetched, fetches them all at once and uses the setters to then populate the fields on the starting objects. This utility works for ManyToOne relationships, but is still inefficient with ManyToMany relationships, because it falls back in the N+1 problem when I collect the ids (as it queries the join table once per object via the getter).
How can I solve this? Has this problem really not been solved yet? Am I missing some obvious settings that solves this automagically?
EDIT:
I made a toy example with some commentary: https://github.com/marcotama/n-1-queries-example

I had faced the same situation and I had 3 ways to solve it;
increase the fetchsize for the dependent attribute so that the queries are executed in batch
write a custom query for the purpose
define entity graph relations and map accordingly to attributes
I personally preferred the 3rd option as it was convenient to do that and was cleaner with spring data JPA.
you can refer to examples from the comments from the below answers:
Spring Data JPA And NamedEntityGraphs
What is the solution for the N+1 issue in JPA and Hibernate?

Write fetch logic on your own.
E.g You have author which has book, author_devices
You can join fetch author with books. Than you can separatly fetch author_devices using repository "where author_id IN (authorsList.stream().map(author.getId())". Than you should detach author and iterate author_devices and assign it to apropriate author devices list. I think it's only adequate solution for situations where you need to join-fetch more than 1 relation.

Best practices implementing list of objects in Java

I'm currently working on improving some old uni assignments moving them from serializable files to any other form of storage, mainly SQL Databases.
I understand the concept of relational database design and the similarities with OOP Classes, however, I'm not entirely sure how to approach this issue from an OOP design perspective.
Right now I have a Hotel class with a List of Rooms as property, each Room has a list of Guests as property (full code here)
Back when using files I could mark these classes with the Serializable interface and store the parent object in a single file. But when using relational DB, I store each list as a single table and use separate queries to obtain the corresponding results. Same goes for the add() operation: with databases, I can do something like Guest.add() and add all the required fields directly to the database, whereas with my current design I need to call Room.getGuestList().add() (or a similar approach).
I totally understand that neither of both approaches is ideal, as both classes should be only worried about storing the data and not about the implementation of an add method, but even if I separate this in a single class, shall I still define a List property within each class?
I'm pretty sure I'm missing a design pattern here, but I cannot find the one that would solve this problem or maybe it's just that I've been taught wrong.
Thanks for your answers
Edit: I've decided thanks to the answers provided to transform my implementation following the DAO pattern as explained in this question and the Oracle documentation.

Normally you would have 3 tables: hotels, rooms, guests.
Rooms would have relation to hotel (hotel id) and guest would have relation to room(room id). That's it.
Those relations can be easily reflected in OOP using some sort of ORM. JPA with Hibernate is an excellent example. Check that out. You will be able to get hotel, its rooms and all guests of hotel just like you described without using a single SQL query in your code.

How to create different relationships with same entity in Neo4J/Spring?

I am using Spring Data Neo4J with Spring Boot v1.3.1. My Neo4J version is 2.1.6 .
Let's say, I have an Entity Person, which can have a relation named Friend with a Set of Person. So, I define a Set as one of the attributes of the Entity, use the #RelatedTo annotation and give it a type named Friend.
What if I want to have multiple other relationships, all with the same entity only, let's say, Enemy, Acquaintance etc. Do I have to define, separate attributes for all of them ? Can't I pass the relationship dynamically ?
For reference:
#NodeEntity
public class Person {
#RelatedTo(type="FRIEND", direction=Direction.BOTH)
public #Fetch Set<Person> friends;
//Do I have to do it like this ? This is odd.
#RelatedTo(type="ENEMY", direction=Direction.BOTH)
public #Fetch Set<Person> enemies;
//getter setters
}
EDIT 1-----------
Right now, I'm facing an issue with creating nodes in a bulk. Explaining the problem below :
After considering the approach suggested by Michael, here is what I have.
Basically, I have to create a lot of nodes in bulk. This node, Person will have an attribute with a unique index over it. Let's call it name. So, when the relations, Friend or Enemy are created, I want them to be created with person with unique name.
So, there will be two steps:
Create the Person nodes.(takes lot of time)
Create the relations between them.(does not take much time, around 30-40 ms)
I tried different approaches of creating nodes in bulk.
One approach was to commit the transactions after a certain number of nodes have been saved.
I had followed this link
I'm not sure about the performance improvement as calling the neo4jTemplate.save() still takes around 500ms.
From my logs:
Time taken to execute save:=612 ms
Time taken to execute save:=566 ms
Is this supposed to alright ?
Another approach was using Cypher, as suggested by Michael in his blog, here.
I used a Cypher query like this :
WITH [{name: "Person1", gravity: 1},
{name: "Person2", gravity: 2}] AS group
FOREACH (person IN group |
CREATE (e:Person {label: person.name, gravity: person.gravity}))
Issue with this approach is nodes do get created in bulk, but the unique index on name attribute is ignored. It seems, I must commit after saving each node.
So, is there any other way, in which I will be able to create nodes in bulk in a faster manner ?

You can handle it with creating relationship entities of the different types.
Or using Neo4jTemplate directly (createRelationshipBetween).
If you have such a dynamic setup, what would your entities look like?
You don't have to list the relationships in your entity. If they are dynamic you can also just have base attributes in your entities and access the relationships via a repository.

HQL joined query to eager fetch a large number of relationships

My project has recently discovered that Hibernate can take multiple levels of relationship and eager fetch them in a single join HQL to produce the filled object we need. We love this feature, figuring it would outperform a lazy fetch circumstance.
Problem is, we hit a situation where a single parent has about a dozen direct relationships, an a few subrelationships off of that, and a few of them have several dozen rows in a few instances. The result is a pretty large cross-product that results in the hql spinning it's wheels virtually forever. We turned logging up to 11 and saw more than 100000 iterations before we gave up and killed it.
So clearly, while this technique is great for some situations, it has limits like everything in life. But what is the best performing alternative in hibernate for this? We don't want to lazy-load these, because we'll get into an N+1 situation that will be even worse.
I'd ideally like to have Hibernate pre-fetch all the rows and details, but do it one relationship at a time, and then hydrate the right detail object to the right parent, but I have no idea if it does such a thing.
Suggestions?
UPDATE:
So we got the SQL this query generated, it turns out that I misdiagnosed the problem. The cross product is NOT that huge. We ran the same query in our database directly and got 500 rows returned in just over a second.
Yet we saw very clearly in the hibernate logging it making 100K iterations. Is it possible Hibernate can get caught in a loop in your relationships or something?
Or maybe this should be asked as a new question?

Our team uses the special strategy to work with associations. Collections are lazy, single relations are lazy too, except references with simply structure (for an example a countries reference). And we use fluent-hibernate to load what we need in a concrete situation. It is simply because of fluent-hibernate supports nested projections. You can refer this unit test to see how complex object net can be partially loaded. A code snippet from the unit test
List<Root> roots = H.<Root> request(Root.class).proj(Root.ROOT_NAME)
.innerJoin("stationarFrom.stationar", "stationar")
.proj("stationar.name", "stationarFrom.stationar.name")
.eq(Root.ROOT_NAME, rootName).transform(Root.class).list();
See also
How to transform a flat result set using Hibernate

Custom hibernate entity persister

I am in the process of performance testing/optimizing a project that maps
a document <--> Java object tree <--> mysql database
The document, Java classes, database schema and logic for mapping is orchestrated with HyperJaxb3. The ORM piece of it is JPA provided by hibernate.
There are about 50 different entities and obviously lots of relationships between them. A major feature of the application is to load the documents and then reorganize the data into new documents; all the pieces of each incoming document eventually gets sent out in one outgoing document. While I would prefer to not be living in the relational world, the transactional semantics are a very good fit for this application - there is a lot of money and government regulation involved, so we need to make sure everything gets delivered exactly once.
Functionally, everything is going well and performance is decent (after a fair amount of tweaking). Each document is made up of a few thousand entities which end up creating a few thousand rows in the database. The documents vary in size, and insert performance is pretty much proportional to the number of rows that need to be inserted (no surprise there).
I see the potential for a significant optimization, and this is where my question lies.
Each document is mapped to a tree of entities. The "leaf" half of the tree contains lots of detailed information that is not used in the decisions for how to generate the outgoing documents. In other words, I don't need to be able to query/filter by the contents of many of the tables.
I would like to map the appropriate entity sub-trees to blobs, and thus save the overhead of inserting/updating/indexing the majority of the rows I am currently handling the usual way.
It seems that my best bet is to implement a custom EntityPersister and associate it with the appropriate entities. Is this the right way to go? The hibernate docs are not bad, but it is a fairly complex class that needs to be implemented and I am left with lots of questions after looking at the javadoc. Can you point me to a concrete, yet simple example that I can use as a starting point?
Any thoughts about another way to approach this optimization?

I've run in to the same problem with storing large amounts of binary data. The solution I found worked best is a denormalization of the Object model. For example, I create a master record, and then I create a second object that holds the binary data. On the master, use the #OneToOne mapping to the secondary object, but mark the association as lazy. Now the data will only be loaded if you need it.
The one thing that might slow you down is the outer join that hibernate performs with all objects of this type. To avoid it, you can mark the object as mandatory. But if the database doesn't give you a huge performance hit, I suggest you leave it alone. I found that Hibernate has a tendency to load the binary data immediately if I tried to get a regular join.
Finally, if you need to retrieve a lot of the binary data in a single SQL call, use the HQL fetch join command. For example: from Article a fetch join a.data where a.data is the one-to-one relationship to the binary holder. The HQL compiler will see this as an instruction to get all the data in a single sql call.
HTH

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.