Hibernate query for perfomance

Hibernate query for perfomance - java

I have a table looking something like this;
#Table
public class Person {
private String name;
private String address;
...
private String score;
}
In my database I now have a lot of persons with names, addresses and scores. Lets say I retrieve a list of persons from another system, where some of the persons already exist in the database and some are new.
Before I persist them in my DB I want to check if they already exist (avoid duplicates), and maybe change the score if the person I get in is the same as the one I already have, but with a different score.
Whats the best query to write if I want to select all persons that exist? (eg. same name and address). My table of persons can contain a huge amount of persons and the list of persons I get in from the other system is also big (new or with updated scores). I need a query that is all about performance :-).
I am using Java and Hibernate. Anyone?
EDIT: The final sql will probably look something like
select * from Person where name='Paul' AND address='road1
OR name='John' AND address='road2'
OR name='Stella' AND address='road3'
and many many more.. The above sql atleast explains what I want.

One way of doing this is to outer join both tables and list all the persons that don't exist on a side . like this (TSQL):
SELECT left.* from db1.owner.persons left LEFT JOIN db2.owner.persons right ON left.name=right.name AND left.address=right.address WHERE right.id IS NULL
Then you can use CreateSQLQuery method of ISession to get the list of persons.
in C# we write it like this
var list=session.CreateSQLQuery(queryString,"left",new []{typeof(Person)}).List();
but I don't think that's much different in java
If you want to gain performance over this query probably it's necessary to put some indexes on each table (over name and address for example)

If I understand correctly, you already have all your "external" persons in memory.
I would create a Map<String, ExternalPerson> containing all your external persons indexed by name.
I would then ask the keySet() of this map to get the list of persons to get from the database.
I would then execute the following query:
select p from Person p where p.name in (:names)
You just has to make sure that the number of names isn't above the limit imposed by your database (1000 in Oracle). If so, you'll have to split the set into several subsets, and repeat the query for each subset.
Then iterate over the query results. For each person found, get its corresponding external person using the map or external persons, and update the score of the current person. Then remove the external person from the map.
At the end of the process, the map contains the external persons that don't exist in the database, and must be created.
If the set of persons is really hign, make sure to use query.scroll() rather than query.list() to iterate through the persons, and regularly flush and clear the session as explained in this section of the reference manual, to avoid memory problems.

Related

How to make a Hibernate SearchSession return results with unique attributes?

I am working on using the Hibernate SearchSession class in Java to perform a search against a database, the code I currently have to search a table looks something like this:
SearchSession searchSession = Search.session(entityManagerFactory.unwrap(SessionFactory.class).withOptions()
.tenantIdentifier("locations").openSession());
SearchResult<Location> result = searchSession.search(Location.class)
.where( f -> f.bool()
.must( f.match()
.field("locationName")
.matching((phrase)).fuzzy())
).fetch(page * limit, limit);
This search works and properly returns results from the database, but there is no uniqueness constraint on the locationName column and the database holds multiple records with the same value in locationName. As a result, when we try to display them on the UI of the application it looks like there are duplicate values, even though they're unique in the database.
Is there a way to make a SearchSession only return a result if another result with an identical value (such as locationName) has not been returned before? Applying a uniqueness constraint to the database table isn't an option in this scenario, and we were hoping there's a way to handle filtering out duplicate values in the session over taking the results from the search and removing duplicate values separately.

Is there a way to make a SearchSession only return a result if another result with an identical value (such as locationName) has not been returned before?
Not really, at least not at the moment.
If you're using the Elasticsearch backend and are fine with going native, you can insert native JSON into the Elasticsearch request, in particular collapsing.
I think something like this might work:
SearchResult<Location> result = searchSession.search( Location.class )
.extension( ElasticsearchExtension.get() )
.where( f -> f.bool()
.must( f.match()
.field("locationName")
.matching((phrase)).fuzzy())
)
.requestTransformer( context -> {
JsonObject collapse = new JsonObject();
collapse.addProperty("field", "locationName_keyword")
JsonObject body = context.body();
body.add( "collapse", collapse );
} )
// You probably need a sort, as well:
.sort(f -> f.field("id"))
.fetch( page * limit, limit );
You will need to add a locationName_keyword field to your Location entity:
#Indexed
#Entity
public class Location {
// ...
#Id
#GenericField(sortable = Sortable.YES) // Add this
private Long id;
// ...
#FullTextField
#KeywordField(name = "locationName_keyword", sortable = Sortable.YES) // Add this
private String locationName;
// ...
}
(You may need to also assign a custom normalizer to the locationName_keyword field, if the duplicate locations have a slightly different locationName (different case, ...))
Note however that the "total hit count" in the Search result will indicate the number of hits before collapsing. So if there's only one matching locationName, but 5 Location instances with that name, the total hit count will be 5, but users will only see one hit. They'll be confused for sure.
That being said, it might be worth having another look at your situation to determine whether collapsing is really necessary here:
As a result, when we try to display them on the UI of the application it looks like there are duplicate values, even though they're unique in the database.
If you have multiple documents with the same locationName, then surely you have multiple rows in the database with the same locationName? Duplication doesn't appear spontaneously when indexing.
I would say the first thing to do would be to step back, and consider whether you really want to query the Location entity, or if another, related entity wouldn't make more sense. When two locations have the same name, do they have a relationship to another, common entity instance (e.g. of type Shop, ...)?
=> If so, you should probably query that entity type instead (.search(Shop.class)), and take advantage of #IndexedEmbedded to allow filtering based on Location properties (i.e. add #IndexedEmbedded to the location association in the Shop entity type, then use the field location.locationName when adding a predicate that should match the location name).
If there is no such related, common entity instance, then I would try to find out why locations are duplicated exactly, and more importantly why that duplication makes sense in the database, but not to users:
Are the users not interested in all the locations? Then maybe you should add another filter to your query (by "type", ...) that would help remove duplicates. If necessary, you could even run multiple search queries: first one with very strict filters, and if there are no hits, fall back to another one with less strict filters.
Are you using some kind of versioning or soft deletion? Then maybe you should avoid indexing soft-deleted entities or older versions; you can do that with conditional indexing or, if that doesn't work, with a filter in your search query.
If your data really is duplicated (legacy database, ...) without any way to pick a duplicate over another except by "just picking the first one", you could consider whether you need an aggregation instead of full-blown search. Are you just looking for the top location names, or maybe a count of locations by name? Then aggregations are the right tool.

Is it possible to query both from entity and list object

I want to query from both the entity and a list object. Say I have an entity called "Customer", and I have a list of potential customers of object Potential.
Customer {
id, name, address, ...
}
Potential {
id, name, address, ...
}
In my repository I write the query as follows if I want to customize the query to get customers
#Query("SELECT c FROM Customer c WHERE c.status = :status")
List<Customer> findAllSpecialCustomers(String status)
But if I currently have a list object
List<Potential> potentials
And I want to include it in the Query above, how should I do it? Or is it even possible? The reason why I want to do this is because both entities represent different tables but i want to do sorting and pagination on the combined records of the two entities. Also, potentials is queried from graph database. While entity Customer is from mysql database.
Basically, I have a potentials list object that is queried from graph database. I want to union it with Customer entity from a mysql database through #Query and apply sorting and pagination to the combined records.

Use a native union query and instead of using Customer or Potential, create another POJO class to map query results.

I assume that there is some property in your Potential class that identifies a Customer. For argument's sake, let's assume that the id field in the two classes are the same, e.g., you want to pair up the Potential with id == 123 with the Customer with id == 123.
The simplest thing that I can think of is to map the list of Potentials to a list of Integers (or whatever type the id is), and then use that as a parameter to an "in" clause in your Customer query. For example,
#Query("SELECT c FROM Customer c WHERE c.id in :idList")
List<Customer> findCustomersById(List<int> idList)
and
findCustomersById(
potentials
.stream()
.map(Potential::getId)
.collect(Collectors.toList()
);
As far as "zipping" the two lists, i.e. pairing up the matches from the two lists, I'll leave that as an exercise for you :-)

How can a map be used to search the list of employe objects on the basis of different parameters?

I was asked this question in an interview today to which I explained the best to my abilities. But I still don't understand if this is the correct answer.
There is a cache which has Employee object as the key. The cache is populated with data from the database. Now there is a UI where we can enter either or all of the 3 attributes from the Employee object- name, ID and date of joining. Now this search would lead to multiple matching results. To achieve this we need to check in the cache for the data.
To this I replied saying that my map would be of the structure - >. for the same EmployeeDetails object ,
I will have multiple keys in the map(EmployeeDetails class is the object which contains complete detail of the Employee including address etc. Employee object just has 3 attributes - name, ID and date of joining.).
One of the objects with only name populated. The other with ID populated and the third one with date of joining populated. And now with the combination of attributes. So the map will be having the following keys -
Employee object with only the name populated -> Value would be list of of all the Employee objects with the same name.
Employee object with only the ID populated -> Value would be list of of all the Employee objects with the same ID. Ideally the list size in this case should be 1.
Employee objects with only the Date Of Joining -> List of all the employee objects with the same date of joining.
Similarly there would be number of other Employee objects. For one such employee , all the three attributes - name , ID and date of joining would be populated.
In this way, I could have achieved the requirement to display all the employee results in case only some of the attributes out of name, ID and value is set on the UI.
I just want to understand if this is the correct way to achieve the outcome (display of list of matching results on the UI). Since I did not get selected, I believe there is something else which I possibly missed!

A reasonable short answer is to maintain 3 separate maps for each of the 3 fields, with each one mapping from each field value to the list of employees with that value for the field.
To perform a lookup, retrieve the lists for each of the values that the user specified, and then (if you have more than one criteria) iterate through the shortest one to filter out employees that don't match the other criteria.
In the cases where you have more than one criteria, one of them has to be name or ID. In real life, the lists for these fields will be very short, so you won't have to iterate through any large collections.
This solution essentially uses the maps as indexes and implements the query like a relational DB. If you were to mention that in an interview, you would get extra points, but you'd need to be able to back it up.

One of the neat things about Java 8 is the Streams API. With this new API, you an hold all of those Employee objects within just a normal List and walk away with the same results you were trying to achieve with multiple mapping objects with less overhead.
See, this API has a .filter() method that you can pass over a List that has been transformed into a Stream to only return objects that meet the criteria described in the body of the filter.
List<Employee> emps = getEmps();
List<Employee> matchedEmps = emps.stream().filter((e)->e.getID().equals(searchedID)).filter((e)->e.getName().equals(searchedName)).collect(Collectors.toList());
As you can see you can chain filters to match multiple criteria, although it may be more efficient just to have all matching done in one filter:
ist<Employee> matchedEmps = emps.stream().filter((e)->{boolean matches = e.getID().equals(searchedID);return matches && e.getName().equals(searchedName);}).collect(Collectors.toList());

I would have a map with the Employee object as key and EmployeeDetails as value. I would get a get Collection of values from the map, create then a custom Comparator for each specific search, iterate through the values collection and use the comparator to compare the values. The search results should be added during the iteration in a results Collection.

One way is create mapping with mapping Employee-EmployeeDetails then for search for a given employee id then you have to iterate over all key and search.The complexity will be O(N).
Second to improve the time complexity even in database we do indexing to avoid full scan.You can try the similar thing here i.e create mapping id-Employee,email-Employee like this when add employee to main map also update to the index map.
Third if possible you can create a TRIE and at end node you can put employee.After getting the employee You can get employee details

How to paginate at field level in cypher query?

Actually, I have a node say NodeA which contains following fields:
id
name
friends
friends field is nothing but Set<String> friends in NodeA class.
If size of friends is huge say 3000 or 5000 or more, How can I paginate through the fields of NodeA ?
For instance: I am firing below query:
start event=node(12) return event.friends; which returns me list of friends as :
["abc","devid","rao","amn","xyz","pqr"].
Is there any way, through which I can select only first 3 friends and so on ?

Currently, there is no generic way to do that, I'm afraid. However, such a function is in the works, according to the devs: http://grokbase.com/t/gg/neo4j/137hhxyer6/cypher-getting-the-first-n-elements-of-a-collection
For now you only could model your friends as nodes of their own and connect them to NodeA via relationships, e.g. of type HAS_FRIEND. Then, you could do some kind of pageing via skip and limit.

multiple Or conditions vs several individual JPQ/Hibernate queries - on entities

I need to query database for different combination of elements from the already received result object.
For instance, I get a list of Person entities. For each person in Person entities, I need to get List of address (for each person).
There are two ways to do it:
Iterate the Person entity and fire a query for each Person entity to get the list of Addresses for that person.
Build a query dynamically with elements from Person entity and fire ONE single query to pull all addresses lists for all Persons and then iterate the Person entity again and match the Address list for each Person.
I don't know much many Person entities I might get. So what is the better approach in terms of performance and practice.
So, if I have 100 Person entities, in the first approach its going to be 100 queries vs 2nd approach with huge query like below
from address where (person.id = 1 and person.zip = 393)
or (person.id = 2 and person.zip = 123)
or (person.id = 3 and person.zip = 345)
.... // 10 times.
Which one is better? Any restrictions / limitation on Or conditions in Oracle?
Is there a better approach? Batch queries?

You can use hibernate with eager loading to directly get the results what you want by loading the person with the restrictions required. Or else if you want to stick to lazy loading try using an inner join with person and Address so that you can then get a list of array which consist the results

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.