How to paginate at field level in cypher query?

How to paginate at field level in cypher query? - java

Actually, I have a node say NodeA which contains following fields:
id
name
friends
friends field is nothing but Set<String> friends in NodeA class.
If size of friends is huge say 3000 or 5000 or more, How can I paginate through the fields of NodeA ?
For instance: I am firing below query:
start event=node(12) return event.friends; which returns me list of friends as :
["abc","devid","rao","amn","xyz","pqr"].
Is there any way, through which I can select only first 3 friends and so on ?

Currently, there is no generic way to do that, I'm afraid. However, such a function is in the works, according to the devs: http://grokbase.com/t/gg/neo4j/137hhxyer6/cypher-getting-the-first-n-elements-of-a-collection
For now you only could model your friends as nodes of their own and connect them to NodeA via relationships, e.g. of type HAS_FRIEND. Then, you could do some kind of pageing via skip and limit.

Related

How to make a Hibernate SearchSession return results with unique attributes?

I am working on using the Hibernate SearchSession class in Java to perform a search against a database, the code I currently have to search a table looks something like this:
SearchSession searchSession = Search.session(entityManagerFactory.unwrap(SessionFactory.class).withOptions()
.tenantIdentifier("locations").openSession());
SearchResult<Location> result = searchSession.search(Location.class)
.where( f -> f.bool()
.must( f.match()
.field("locationName")
.matching((phrase)).fuzzy())
).fetch(page * limit, limit);
This search works and properly returns results from the database, but there is no uniqueness constraint on the locationName column and the database holds multiple records with the same value in locationName. As a result, when we try to display them on the UI of the application it looks like there are duplicate values, even though they're unique in the database.
Is there a way to make a SearchSession only return a result if another result with an identical value (such as locationName) has not been returned before? Applying a uniqueness constraint to the database table isn't an option in this scenario, and we were hoping there's a way to handle filtering out duplicate values in the session over taking the results from the search and removing duplicate values separately.

Is there a way to make a SearchSession only return a result if another result with an identical value (such as locationName) has not been returned before?
Not really, at least not at the moment.
If you're using the Elasticsearch backend and are fine with going native, you can insert native JSON into the Elasticsearch request, in particular collapsing.
I think something like this might work:
SearchResult<Location> result = searchSession.search( Location.class )
.extension( ElasticsearchExtension.get() )
.where( f -> f.bool()
.must( f.match()
.field("locationName")
.matching((phrase)).fuzzy())
)
.requestTransformer( context -> {
JsonObject collapse = new JsonObject();
collapse.addProperty("field", "locationName_keyword")
JsonObject body = context.body();
body.add( "collapse", collapse );
} )
// You probably need a sort, as well:
.sort(f -> f.field("id"))
.fetch( page * limit, limit );
You will need to add a locationName_keyword field to your Location entity:
#Indexed
#Entity
public class Location {
// ...
#Id
#GenericField(sortable = Sortable.YES) // Add this
private Long id;
// ...
#FullTextField
#KeywordField(name = "locationName_keyword", sortable = Sortable.YES) // Add this
private String locationName;
// ...
}
(You may need to also assign a custom normalizer to the locationName_keyword field, if the duplicate locations have a slightly different locationName (different case, ...))
Note however that the "total hit count" in the Search result will indicate the number of hits before collapsing. So if there's only one matching locationName, but 5 Location instances with that name, the total hit count will be 5, but users will only see one hit. They'll be confused for sure.
That being said, it might be worth having another look at your situation to determine whether collapsing is really necessary here:
As a result, when we try to display them on the UI of the application it looks like there are duplicate values, even though they're unique in the database.
If you have multiple documents with the same locationName, then surely you have multiple rows in the database with the same locationName? Duplication doesn't appear spontaneously when indexing.
I would say the first thing to do would be to step back, and consider whether you really want to query the Location entity, or if another, related entity wouldn't make more sense. When two locations have the same name, do they have a relationship to another, common entity instance (e.g. of type Shop, ...)?
=> If so, you should probably query that entity type instead (.search(Shop.class)), and take advantage of #IndexedEmbedded to allow filtering based on Location properties (i.e. add #IndexedEmbedded to the location association in the Shop entity type, then use the field location.locationName when adding a predicate that should match the location name).
If there is no such related, common entity instance, then I would try to find out why locations are duplicated exactly, and more importantly why that duplication makes sense in the database, but not to users:
Are the users not interested in all the locations? Then maybe you should add another filter to your query (by "type", ...) that would help remove duplicates. If necessary, you could even run multiple search queries: first one with very strict filters, and if there are no hits, fall back to another one with less strict filters.
Are you using some kind of versioning or soft deletion? Then maybe you should avoid indexing soft-deleted entities or older versions; you can do that with conditional indexing or, if that doesn't work, with a filter in your search query.
If your data really is duplicated (legacy database, ...) without any way to pick a duplicate over another except by "just picking the first one", you could consider whether you need an aggregation instead of full-blown search. Are you just looking for the top location names, or maybe a count of locations by name? Then aggregations are the right tool.

How to fetch nodes and all their children in Spring Data Neo4j

I have this Neo4j Node class:
#Node
#Data
#AllArgsConstructor
public class Person {
#Id
#GeneratedValue
private Long id;
private Long parentId;
#Relationship(type = "PARENT_OF", direction = Relationship.Direction.OUTGOING)
private List<Person> children;
public Person addChild(Person person) {
person.setParentId(this.id);
this.children.add(person);
return this;
}
}
I would like to build a query to use with the Spring Data #Query annotation, in order to fetch a list of genealogical trees, where the roots have null parentId. For each root I also would like to fetch their children, for each child their own children, etc.
The best I could come up with, so far, is the following:
public interface PersonRepository extends Neo4jRepository<Person, Long> {
#Query("""
MATCH (person:Person)
WHERE person.parentId IS NULL
OPTIONAL MATCH (person)-[parentOf:PARENT_OF]->(children)
RETURN person, collect(parentOf), collect(children)
SKIP 0
LIMIT 10
""")
List<Person> findAllGenealogicalTrees();
}
but it doesn't seem to do what I'm looking for, as it seems to only fetch the children of the roots, but not the children of the children.
Is something wrong with my query?
EDIT:
Tried the suggested query:
MATCH path=(person)-[parentOf:PARENT_OF*]->(child)
WHERE person.parentId IS NULL
AND NOT (child)-[:PARENT_OF]->()
RETURN path
but the resulting list seems to be the following:
Person(id=0, parentId=null, children=[Person(id=1, parentId=0, children=[Person(id=3, parentId=1, children=[])])])
Person(id=1, parentId=0, children=[Person(id=3, parentId=1, children=[])])
Person(id=3, parentId=1, children=[])
I was expecting the first record only, since parentId should be null. How come is it returning two other records that have a not null parentId?

I think we can agree that the first query by its nature does only one hop because you explicitly say with (person)-[parentOf:PARENT_OF]->(children) that you only want to find the direct children.
The suggestion #Graphileon gave goes into the right direction but from its return part only provides an unordered set of nodes and relationships.
Spring Data Neo4j can only assume that all Persons have the same importance and thus returns a collection of all Persons.
What I would suggest is to stay with the path-based approach but modifying the return statement in a way that Spring Data Neo4j and you agree on ;)
MATCH path=(person)-[:PARENT_OF*]->(child:Person)
WHERE person.parentId IS NULL
RETURN person, collect(nodes(path)), collect(relationships(path))
Reference: https://docs.spring.io/spring-data/neo4j/docs/current/reference/html/#custom-queries.for-relationships.long-paths
Another approach could also be that you are using the so-called derived finder methods in your repository:
List<Person> findAllByParentIdIsNull();
Or if you want to have it pageable (don't forget some ordering because the data could get returned randomly otherwise):
Page<Person> findAllByParentIdIsNull(Pageable pageable);
This creates the internal query generator which will do an explorative search (non-path based queries) over the data with multiple cascading queries.
There are (in general) a few things to keep in mind when making the decision:
The path-based approach could really ramp up the memory usage in the database if you have a lot of hops and branches which leads to relative slow response times. I assume that won't be a problem for your domain above but this is always something I would keep an eye on.
In both cases (path-based or cascading queries) you will end up with three buckets of data: The root node, all relationships and all related nodes. The mapping will take some time because Spring Data Neo4j would have to match every returned relationship with the right related node for each relationship it wants to map. There is nothing wrong with this but the result of having a cyclic mapped domain.

To get all the paths, you can use a variable-length pattern
MATCH path=(person)-[parentOf:PARENT_OF*]->(child)
WHERE person.parentId IS NULL
AND NOT (child)-[:PARENT_OF]->()
RETURN path

How can a map be used to search the list of employe objects on the basis of different parameters?

I was asked this question in an interview today to which I explained the best to my abilities. But I still don't understand if this is the correct answer.
There is a cache which has Employee object as the key. The cache is populated with data from the database. Now there is a UI where we can enter either or all of the 3 attributes from the Employee object- name, ID and date of joining. Now this search would lead to multiple matching results. To achieve this we need to check in the cache for the data.
To this I replied saying that my map would be of the structure - >. for the same EmployeeDetails object ,
I will have multiple keys in the map(EmployeeDetails class is the object which contains complete detail of the Employee including address etc. Employee object just has 3 attributes - name, ID and date of joining.).
One of the objects with only name populated. The other with ID populated and the third one with date of joining populated. And now with the combination of attributes. So the map will be having the following keys -
Employee object with only the name populated -> Value would be list of of all the Employee objects with the same name.
Employee object with only the ID populated -> Value would be list of of all the Employee objects with the same ID. Ideally the list size in this case should be 1.
Employee objects with only the Date Of Joining -> List of all the employee objects with the same date of joining.
Similarly there would be number of other Employee objects. For one such employee , all the three attributes - name , ID and date of joining would be populated.
In this way, I could have achieved the requirement to display all the employee results in case only some of the attributes out of name, ID and value is set on the UI.
I just want to understand if this is the correct way to achieve the outcome (display of list of matching results on the UI). Since I did not get selected, I believe there is something else which I possibly missed!

A reasonable short answer is to maintain 3 separate maps for each of the 3 fields, with each one mapping from each field value to the list of employees with that value for the field.
To perform a lookup, retrieve the lists for each of the values that the user specified, and then (if you have more than one criteria) iterate through the shortest one to filter out employees that don't match the other criteria.
In the cases where you have more than one criteria, one of them has to be name or ID. In real life, the lists for these fields will be very short, so you won't have to iterate through any large collections.
This solution essentially uses the maps as indexes and implements the query like a relational DB. If you were to mention that in an interview, you would get extra points, but you'd need to be able to back it up.

One of the neat things about Java 8 is the Streams API. With this new API, you an hold all of those Employee objects within just a normal List and walk away with the same results you were trying to achieve with multiple mapping objects with less overhead.
See, this API has a .filter() method that you can pass over a List that has been transformed into a Stream to only return objects that meet the criteria described in the body of the filter.
List<Employee> emps = getEmps();
List<Employee> matchedEmps = emps.stream().filter((e)->e.getID().equals(searchedID)).filter((e)->e.getName().equals(searchedName)).collect(Collectors.toList());
As you can see you can chain filters to match multiple criteria, although it may be more efficient just to have all matching done in one filter:
ist<Employee> matchedEmps = emps.stream().filter((e)->{boolean matches = e.getID().equals(searchedID);return matches && e.getName().equals(searchedName);}).collect(Collectors.toList());

I would have a map with the Employee object as key and EmployeeDetails as value. I would get a get Collection of values from the map, create then a custom Comparator for each specific search, iterate through the values collection and use the comparator to compare the values. The search results should be added during the iteration in a results Collection.

One way is create mapping with mapping Employee-EmployeeDetails then for search for a given employee id then you have to iterate over all key and search.The complexity will be O(N).
Second to improve the time complexity even in database we do indexing to avoid full scan.You can try the similar thing here i.e create mapping id-Employee,email-Employee like this when add employee to main map also update to the index map.
Third if possible you can create a TRIE and at end node you can put employee.After getting the employee You can get employee details

multiple Or conditions vs several individual JPQ/Hibernate queries - on entities

I need to query database for different combination of elements from the already received result object.
For instance, I get a list of Person entities. For each person in Person entities, I need to get List of address (for each person).
There are two ways to do it:
Iterate the Person entity and fire a query for each Person entity to get the list of Addresses for that person.
Build a query dynamically with elements from Person entity and fire ONE single query to pull all addresses lists for all Persons and then iterate the Person entity again and match the Address list for each Person.
I don't know much many Person entities I might get. So what is the better approach in terms of performance and practice.
So, if I have 100 Person entities, in the first approach its going to be 100 queries vs 2nd approach with huge query like below
from address where (person.id = 1 and person.zip = 393)
or (person.id = 2 and person.zip = 123)
or (person.id = 3 and person.zip = 345)
.... // 10 times.
Which one is better? Any restrictions / limitation on Or conditions in Oracle?
Is there a better approach? Batch queries?

You can use hibernate with eager loading to directly get the results what you want by loading the person with the restrictions required. Or else if you want to stick to lazy loading try using an inner join with person and Address so that you can then get a list of array which consist the results

Hibernate query for perfomance

I have a table looking something like this;
#Table
public class Person {
private String name;
private String address;
...
private String score;
}
In my database I now have a lot of persons with names, addresses and scores. Lets say I retrieve a list of persons from another system, where some of the persons already exist in the database and some are new.
Before I persist them in my DB I want to check if they already exist (avoid duplicates), and maybe change the score if the person I get in is the same as the one I already have, but with a different score.
Whats the best query to write if I want to select all persons that exist? (eg. same name and address). My table of persons can contain a huge amount of persons and the list of persons I get in from the other system is also big (new or with updated scores). I need a query that is all about performance :-).
I am using Java and Hibernate. Anyone?
EDIT: The final sql will probably look something like
select * from Person where name='Paul' AND address='road1
OR name='John' AND address='road2'
OR name='Stella' AND address='road3'
and many many more.. The above sql atleast explains what I want.

One way of doing this is to outer join both tables and list all the persons that don't exist on a side . like this (TSQL):
SELECT left.* from db1.owner.persons left LEFT JOIN db2.owner.persons right ON left.name=right.name AND left.address=right.address WHERE right.id IS NULL
Then you can use CreateSQLQuery method of ISession to get the list of persons.
in C# we write it like this
var list=session.CreateSQLQuery(queryString,"left",new []{typeof(Person)}).List();
but I don't think that's much different in java
If you want to gain performance over this query probably it's necessary to put some indexes on each table (over name and address for example)

If I understand correctly, you already have all your "external" persons in memory.
I would create a Map<String, ExternalPerson> containing all your external persons indexed by name.
I would then ask the keySet() of this map to get the list of persons to get from the database.
I would then execute the following query:
select p from Person p where p.name in (:names)
You just has to make sure that the number of names isn't above the limit imposed by your database (1000 in Oracle). If so, you'll have to split the set into several subsets, and repeat the query for each subset.
Then iterate over the query results. For each person found, get its corresponding external person using the map or external persons, and update the score of the current person. Then remove the external person from the map.
At the end of the process, the map contains the external persons that don't exist in the database, and must be created.
If the set of persons is really hign, make sure to use query.scroll() rather than query.list() to iterate through the persons, and regularly flush and clear the session as explained in this section of the reference manual, to avoid memory problems.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.