I was trying to use the Spring's CrudRepository work with Hibernate to delete rows by a non-primary-key column, using deleteByColumnName method. However, the actual executed query is very inefficient and too slow in practice.
Suppose I have two tables Project and Employee, and each employee is in charge of some projects, which implies that the Project table has a field employee_id. Now I would like to delete some projects by employee_id. I wrote something like
public interface ProjectRepository extends CrudRepository<Project, String> {
#Transactional
void deleteByEmployeeId(String employeeId);
}
What I am expecting is Hibernate will execute the following query for this method
DELETE FROM Project
WHERE employee_id = ?
However, Hibernate executes it in a drastically slow way like
SELECT id FROM Project
WHERE employee_id = ?
Hibernate stores the above result in a list, and execute
DELETE FROM Project
WHERE id = ?
for N times... (it executes in batch though)
To address this inefficiency problem, I have to override the method by writing SQL directly, like
public interface ProjectRepository extends CrudRepository<Project, String> {
#Query("DELETE FROM Project p where p.employee_id = ?1")
#Modifying
#Transactional
void deleteByEmployeeId(String employeeId);
}
Then the behavior will be exactly the same as what I am expecting.
The performance is substantially distinct when I delete about 1k rows in a table containing around 500k entries. The first method will take 45 seconds to finish the deleting compared to the second methods taking only 250ms!
The reason I use Hibernate is taking advantage of its ORM strategy that avoids the use of SQL language directly, which is easy to maintain in the long run. At this point, is there anyone who know how to let Hibernate execute the deletion in the manner of my second method without directly writing the SQL? Is there something I am missing to optimize the Hibernate performance?
Thanks in advance!
Here you can find a good explanation why Hibernate has this bad performace when deleting Project items Best Practices for Many-To-One and One-To-Many Association Mappings
Related
I was writing Java code to get all the rows from the database table.
I was using CrudRepository and used this method below.
public interface StudentRepository extends CrudRepository<Student, Long>
{
public List<Student> findById(long id);
}
or
#Query(value = "SELECT s FROM Student s")
List<Student> customMethod(long id);
Which method is faster? Does Java internal method provide faster than our custom query?
Thanks in advance.
The default findById provided by Spring Data Repository and a query-annotated method have significantly different semantics. But, to keep it short, I will try to focus on differences in performance exclusively.
Unless you have query cache enabled, a query-annotated method will always hit the database with a query.
findById, on the other hand, ultimately calls EntityManager.find(). EntityManager.find() looks up the entity in the persistence context first. That means if the entity has already been loaded into the context, the call will not hit the underlying database.
As a side note, if you're curious as to how Spring implements the default repository methods, have a look at the source of SimpleJpaRepository.
You have to understand that findAll() method eventually generates the query for the selection. The only way to prove that is to test it. I don't think you will gain a significant performance boost. JPA's, on another hand, query generation is extremely easy to understand and use. So, if you hesitate between using one or the other, I would stick to findAll() JPA or spring data repository methods.
Hi I used spring data to map my Entity and Repository. The mapping is very simple:
public class Car {
Set<Part> parts;
}
public class Part {
}
I use the findAllByIds(Iterable) interface of my spring data repository. And it generates a nice sql in the form of:
select from CAR where id in (?, ?, ?, ?)
for each Car it executes exactly one SQL.
Select from Part where car_id = ?
My problem starts when the related parts are fetch. It apears that it is fetching them one by one. Is there in spring data jdbc something equivalent to the batch fetching in hibernate ?
If the anser is negative is there some relatively easy way to implement it ?
Unfortunately, the answer is short answer is "No" to both questions right now.
If you want to implement batching for selects what you would need to do is to come up with
a) a new implementation of the DataAccessStrategy which essentially implements all the CRUD functionality, and/or
b) a new EntityRowMapper which converts ResultSet rows into entities.
The first one is needed if you want to execute a different SQL statement to start with.
The second one if you consider changing subsequent SQL sufficient.
There are issues around batching that you might want to track or if the exact variant you are looking for doesn't exist, feel free to create another one.
In my application, a non-standard situation, I have a layer of entity in mysql, layer of dominain in controllers. My domain model contains a few entities, can this be integrated into one JPQL query?
entity layer:
PersonEntity table
EventEntity table
EventVisitorEntity table
PersonEntity many to many EventEntity
EventVisitorEntity interim table
domain layer:
class PersonInfo {
Person person;
List<PersonEvent> personEvent
...
}
Now I get all Person to take their ids and get the PersonEvent, using this query:
#Query("SELECT new domain.PersonEvent(ev.personId,ev.eventId,e.name,ev.state)" +
" FROM EventVisitorEntity AS ev ,EventEntity AS e WHERE e.id = ev.eventId AND ev.personId IN (?1)")
List<PersonEvent> findEventsForPerson(List<Integer> ids);
It is possible to write one query to get persons with an personEvents ?
in the constructor which is below:
public PersonInfo(Person person, List<PersonEvent> personEvents)
Hardly doubt it is possible. JPQL subqueries, which might help you outline personEvents, are allowed only in where and having clauses.
Instead, I'd suggest you to just embrace the query as-is and move the logic of gathering to your DAO tier. This link might be helpful: https://dzone.com/articles/add-custom-functionality-to-a-spring-data-reposito. Declare a method List<PersonEvent> findEventsForPerson(List<Integer> ids), implement custom repository for it, doing all nesessary JPQL queries and combinations there. But beware of N+1 issue.
Also it may be convenient to use entity graphs in such custom implementation.
EDIT: After rereading the spec on fresh mind, I realized that I have mistaken saying that subqueries are allowed only in WHERE/HAVING clauses. It says that it may be used there, which doesn't exclude the opposite. Anyway, even if it is possible, such approach (extracting relation via subqueries) would most probably lead to N+1 issues, unless JPA implementors are smart enough to predict that (I wouldn't count on that anyway).
What is the difference between delete(...) and deleteInBatch(...) methods in JpaRepostory in Spring ? The second one "deletes items in one SQL statement", but what does it mean from the application/database perspective ? Why exists two different methods with the similar results and when it is better to use one or other ?
EDIT:
The same applies also for deleteAll() and deleteAllInBatch() ...
The answers here are not complete!
First off, let's check the documentation!
void deleteInBatch(Iterable<T> entities)
Deletes the given entities in a batch which means it will create a single Query.
So the "delete[All]InBatch" methods will use a JPA Batch delete, like "DELETE FROM table [WHERE ...]". That may be WAY more efficient, but it has some caveats:
This will not call any JPA/Hibernate lifecycle hooks you might have (#PreDelete)
It will NOT CASCADE to other entities
You have to clear your persistence context or just assume it is invalidated.
That's because JPA will issue a bulk DELETE statement to the database, bypassing the cache etc. and thus can't know which entities were affected.
See Hibernate Docs
The actual code in Spring Data JPA
And while I can't find a specific article here, I'd recommend everything Vlad Mihalcea has written, to gain a deeper understanding of JPA.
TLDR: The "inBatch" methods use bulk DELETE statements, which can be drastically faster, but have some caveats bc. they bypass the JPA cache. You should really know how they work and when to use them to benefit.
The delete method is going to delete your entity in one operation. The deleteInBatch is going to batch several delete-statements and delete them as 1 operation.
If you need a lot of delete operations the batch-deletion might be faster.
deleteInBatch(...) in the log would look like this:
DELETE FROM table_name WHERE (((((((? = id) OR (? = id)) OR (? = id)) OR (? = id)) OR (? = id)) OR (? = id)) OR (? = id))
That might leads to a problem if there are a large amount of data to be deleted, which reaches maximum size of the SQL server query:
Maximum size for a SQL Server Query? IN clause? Is there a Better Approach
Just do add curious information.
You can't create your custom delete using 'batch' on the method name and wait for spring data to resolve it, for example, you can't do this:
void deleteByYourAttributeInBatch(Iterable<YourObject> object);
Do you need to do something like this:
#Modifying
#Transactional
#Query("DELETE FROM YourObject qr WHERE o.yourAtribute IN (:object)")
void deleteByYourAttributeInBatch(Iterable<YourObject> o);
Maybe it's an issue to spring-data ;)
I have an entity class set up in Java, with a many-to-many relationship to another class. However, rather than selecting the entire entity collection, I'd like to select only a property from the child entities. The reason for doing this is that it will lower the amount of data being loaded into the system as I don't always need the entire entity depending on my view.
This is what I have so far:
#Entity
public class Disposition {
...
#ManyToMany
private List<Project> projects;
...
}
This works fine and retrieves a list of Project instances. However, I don't want to get all the Projects for the Disposition; I only want to retrieve Project.name.
The only solution I've been able to come up with so far is using the #Formula annotation but I'd like to avoid this if possible since it requires writing native SQL instead of HQL.
This view is read-only so I don't expect any changes to the data to be persisted.
you can use hql to only get the child's name. It would look something like
"select p.name from Project p where p.parent_id = ?"
you would have to tailor the variable names in that, and use a parameterized query to replace the ? with the id of the parent.
It is common to have tailored DAO methods for exactly this sort of situation.
This is where object relational mapping cannot help you anymore. But you can use the Query API which allows to query arbitrary objects by HQL, not SQL. Isn't #Formula using HQL, too?
It is not Hibernate, but the ebean project could interrest you. Ebean is an ORM project using the JPA annotations and allowing the lazy (partial) loading of objects.
In your example, getting only project names would result in this code:
List<Project> projects = Ebean.find(Project.class)
.select("name") // Only name properties are loaded
.where().eq("disposition", yourDisposition)
.findList();
Then, if you try to get project owner (or every other property), theses properties will be lazy loaded by Ebean.
Check out org.hibernate.criterion.Projections. Given a Criteria you can simply do the following:
criteria.setProjection(Projections.property("name"));