I have this Neo4j Node class:
#Node
#Data
#AllArgsConstructor
public class Person {
#Id
#GeneratedValue
private Long id;
private Long parentId;
#Relationship(type = "PARENT_OF", direction = Relationship.Direction.OUTGOING)
private List<Person> children;
public Person addChild(Person person) {
person.setParentId(this.id);
this.children.add(person);
return this;
}
}
I would like to build a query to use with the Spring Data #Query annotation, in order to fetch a list of genealogical trees, where the roots have null parentId. For each root I also would like to fetch their children, for each child their own children, etc.
The best I could come up with, so far, is the following:
public interface PersonRepository extends Neo4jRepository<Person, Long> {
#Query("""
MATCH (person:Person)
WHERE person.parentId IS NULL
OPTIONAL MATCH (person)-[parentOf:PARENT_OF]->(children)
RETURN person, collect(parentOf), collect(children)
SKIP 0
LIMIT 10
""")
List<Person> findAllGenealogicalTrees();
}
but it doesn't seem to do what I'm looking for, as it seems to only fetch the children of the roots, but not the children of the children.
Is something wrong with my query?
EDIT:
Tried the suggested query:
MATCH path=(person)-[parentOf:PARENT_OF*]->(child)
WHERE person.parentId IS NULL
AND NOT (child)-[:PARENT_OF]->()
RETURN path
but the resulting list seems to be the following:
Person(id=0, parentId=null, children=[Person(id=1, parentId=0, children=[Person(id=3, parentId=1, children=[])])])
Person(id=1, parentId=0, children=[Person(id=3, parentId=1, children=[])])
Person(id=3, parentId=1, children=[])
I was expecting the first record only, since parentId should be null. How come is it returning two other records that have a not null parentId?
I think we can agree that the first query by its nature does only one hop because you explicitly say with (person)-[parentOf:PARENT_OF]->(children) that you only want to find the direct children.
The suggestion #Graphileon gave goes into the right direction but from its return part only provides an unordered set of nodes and relationships.
Spring Data Neo4j can only assume that all Persons have the same importance and thus returns a collection of all Persons.
What I would suggest is to stay with the path-based approach but modifying the return statement in a way that Spring Data Neo4j and you agree on ;)
MATCH path=(person)-[:PARENT_OF*]->(child:Person)
WHERE person.parentId IS NULL
RETURN person, collect(nodes(path)), collect(relationships(path))
Reference: https://docs.spring.io/spring-data/neo4j/docs/current/reference/html/#custom-queries.for-relationships.long-paths
Another approach could also be that you are using the so-called derived finder methods in your repository:
List<Person> findAllByParentIdIsNull();
Or if you want to have it pageable (don't forget some ordering because the data could get returned randomly otherwise):
Page<Person> findAllByParentIdIsNull(Pageable pageable);
This creates the internal query generator which will do an explorative search (non-path based queries) over the data with multiple cascading queries.
There are (in general) a few things to keep in mind when making the decision:
The path-based approach could really ramp up the memory usage in the database if you have a lot of hops and branches which leads to relative slow response times. I assume that won't be a problem for your domain above but this is always something I would keep an eye on.
In both cases (path-based or cascading queries) you will end up with three buckets of data: The root node, all relationships and all related nodes. The mapping will take some time because Spring Data Neo4j would have to match every returned relationship with the right related node for each relationship it wants to map. There is nothing wrong with this but the result of having a cyclic mapped domain.
To get all the paths, you can use a variable-length pattern
MATCH path=(person)-[parentOf:PARENT_OF*]->(child)
WHERE person.parentId IS NULL
AND NOT (child)-[:PARENT_OF]->()
RETURN path
Related
I am working on using the Hibernate SearchSession class in Java to perform a search against a database, the code I currently have to search a table looks something like this:
SearchSession searchSession = Search.session(entityManagerFactory.unwrap(SessionFactory.class).withOptions()
.tenantIdentifier("locations").openSession());
SearchResult<Location> result = searchSession.search(Location.class)
.where( f -> f.bool()
.must( f.match()
.field("locationName")
.matching((phrase)).fuzzy())
).fetch(page * limit, limit);
This search works and properly returns results from the database, but there is no uniqueness constraint on the locationName column and the database holds multiple records with the same value in locationName. As a result, when we try to display them on the UI of the application it looks like there are duplicate values, even though they're unique in the database.
Is there a way to make a SearchSession only return a result if another result with an identical value (such as locationName) has not been returned before? Applying a uniqueness constraint to the database table isn't an option in this scenario, and we were hoping there's a way to handle filtering out duplicate values in the session over taking the results from the search and removing duplicate values separately.
Is there a way to make a SearchSession only return a result if another result with an identical value (such as locationName) has not been returned before?
Not really, at least not at the moment.
If you're using the Elasticsearch backend and are fine with going native, you can insert native JSON into the Elasticsearch request, in particular collapsing.
I think something like this might work:
SearchResult<Location> result = searchSession.search( Location.class )
.extension( ElasticsearchExtension.get() )
.where( f -> f.bool()
.must( f.match()
.field("locationName")
.matching((phrase)).fuzzy())
)
.requestTransformer( context -> {
JsonObject collapse = new JsonObject();
collapse.addProperty("field", "locationName_keyword")
JsonObject body = context.body();
body.add( "collapse", collapse );
} )
// You probably need a sort, as well:
.sort(f -> f.field("id"))
.fetch( page * limit, limit );
You will need to add a locationName_keyword field to your Location entity:
#Indexed
#Entity
public class Location {
// ...
#Id
#GenericField(sortable = Sortable.YES) // Add this
private Long id;
// ...
#FullTextField
#KeywordField(name = "locationName_keyword", sortable = Sortable.YES) // Add this
private String locationName;
// ...
}
(You may need to also assign a custom normalizer to the locationName_keyword field, if the duplicate locations have a slightly different locationName (different case, ...))
Note however that the "total hit count" in the Search result will indicate the number of hits before collapsing. So if there's only one matching locationName, but 5 Location instances with that name, the total hit count will be 5, but users will only see one hit. They'll be confused for sure.
That being said, it might be worth having another look at your situation to determine whether collapsing is really necessary here:
As a result, when we try to display them on the UI of the application it looks like there are duplicate values, even though they're unique in the database.
If you have multiple documents with the same locationName, then surely you have multiple rows in the database with the same locationName? Duplication doesn't appear spontaneously when indexing.
I would say the first thing to do would be to step back, and consider whether you really want to query the Location entity, or if another, related entity wouldn't make more sense. When two locations have the same name, do they have a relationship to another, common entity instance (e.g. of type Shop, ...)?
=> If so, you should probably query that entity type instead (.search(Shop.class)), and take advantage of #IndexedEmbedded to allow filtering based on Location properties (i.e. add #IndexedEmbedded to the location association in the Shop entity type, then use the field location.locationName when adding a predicate that should match the location name).
If there is no such related, common entity instance, then I would try to find out why locations are duplicated exactly, and more importantly why that duplication makes sense in the database, but not to users:
Are the users not interested in all the locations? Then maybe you should add another filter to your query (by "type", ...) that would help remove duplicates. If necessary, you could even run multiple search queries: first one with very strict filters, and if there are no hits, fall back to another one with less strict filters.
Are you using some kind of versioning or soft deletion? Then maybe you should avoid indexing soft-deleted entities or older versions; you can do that with conditional indexing or, if that doesn't work, with a filter in your search query.
If your data really is duplicated (legacy database, ...) without any way to pick a duplicate over another except by "just picking the first one", you could consider whether you need an aggregation instead of full-blown search. Are you just looking for the top location names, or maybe a count of locations by name? Then aggregations are the right tool.
I have the following code:
I have a unidirectional one-to-many relationship between Article and Comments:
#Entity
public class Article {
#OneToMany(orphanRemoval=true)
#JoinColumn(name = "article_id")
private List<Comment> comments= new ArrayList<>();
…
}
I used set ophanRemoval=true in order to mark the "child" entity to be removed when it's no longer referenced from the "parent" entity, e.g. when you remove the child entity from the corresponding collection of the parent entity.
Here is an example:
#Service
public class MyService {
public Article modifyComment(Long articleId) {
Article article = repository.findById(articleId);
List<Comments> comments = article.getComments();
//Calls a method which modifies removes some comments from the collection based on some logic
removeSomeComments(comments); //side effect
modifyComments(comments); //side effect
.....
return repository.save(article);
}
}
So I have some statements that perform some actions on the collection, which will then get persisted in the database. In the example above I am getting the article from the database, performing some mutations on the object, by deleting/modifying some comments and then saving it in the database.
I am not sure what's the cleanest way of modifying collections of objects without having to many side-effects, which leads to an error-prone code (my code is more complex and requires multiple mutations on the collection).
Since I am inside the transaction any changes (adding, deleting or modifying children) to the collection will be persisted the next time EntityManager.commit() is called.
However, I tried to refactor this code and write it in more expressive functional style:
public Article modifyComment(Long articleId) {
Article article = repository.findById(articleId);
List<Comment> updatedComments = article.getComments().stream()
filter(some logic..) //remove some comments from the list based on a filter
sorted()
.filter(again some logic) //do more stuff
.collect(Collectors.toList());
article.add(updatedComments);
return repository.save(article);
}
I like this approach more, as it short, concise and more expressive.
However this won't work since it throws:
A collection with cascade=“all-delete-orphan” was no longer referenced by the owning entity instance
That's because I am assigning a new list (updatedComments) .
If I want to remove or modify children from the parent I have to modify the contents of the list instead of assigning a new list.
So I had to do this at the end:
article.getComments().clear();
article.getComments().addAll(updatedComments);
repository.save(article)
Do you consider the second example a good practice?
I am not sure how to work with collections in JPA.
My business logic is more complex and i want to avoid having 3-4 methods that mutate a given collection (attached to a hibernate session) which was passed in as parameter.
I think the second example has less potential for side effects because it doesn't mutate any input parameter. What do you think?
(I am using Spring-Boot 2.2.5)
You can actually try and turn the predicate logic used in your filter
.filter(some logic..) //remove some comments from the list based on a filter
to be used within removeIf and perform the modification as:
Article article = repository.findById(articleId);
article.getComments().removeIf(...inverse of some logic...) //this
return repository.save(article);
Im trying to put the RoomEntity Class in the List as its generic parameter but the List Class turns red(Error) and the only thing that it suggests is for me to change the List Class to Optional Class.
public interface RoomRepository extends CrudRepository<RoomEntity, Long> {
List<RoomEntity> findById(Long id);
}
RoomEntity Class
#Entity
#Table(name = "Room")
public class RoomEntity {
}
are they the same?
List<RoomEntity> findById(Long id);
Optional<RoomEntity> findById(Long id);
Optional and List are two very different concepts.
The CrudRepository findAllById method returns an Iterable.
The findById method returns an Optional.
An iterable can be used to access one by one the elements of a collection and so can be used in conjunction with the List class.
An Optional is a container object which may or may not contain a non-null value (zero or one elements). This will have a single element in it, not many elements like in a List, if there is one.
The CrudRepository::findAllById can have more than one ID sent to it and so can return more than one element, the way it does this is to return an iterable you can use to select each of the returned results one by one. The findById method can only be sent a single ID and so returns that element if it is present (wrapped in an Optional), or an Optional.none if it is not.
If you are looking for a list to be returned because you intend to send in multiple IDs then use the findAllById method. If you want a specific element by ID only, then use the findById method but then you will have to unwrap the Optional object it is returned in before you can use it outside of a stream pipeline using Optional.get, Optional.isPresent, or using a map or flatmap call over it in a streams pipeline.
Spring data JPA will fit the query result to your desired container
You ask a List<>, Spring will initialize a list and add any row data to that list and return it for you. Hence it will:
Return empty list if no items found
Return populated list if items found
When you ask an Optional<>, Spring will understand that you want at most one row data. It will interpreted as getSingleResult() on javax.persistence.Query. Hence it will:
Return Optional.empty() if no items found
Return Optional.of(result) if exactly one match
Throw exceptions if there are more than one match (The one I remember is NonUniqueResultException)
In your case, you find by id. It's unique on your table so Optional<> should fit your purpose.
But note that your List<RoomEntity> findById(Long id); definition is correct and it won't give you compiler error (turn red). Have you imported the List interface?
The findById method is supposed to look for a single entity by it’s id. After all, ids are unique for every entity.
You can try to use findAllById,
but I doubt it’ll make much difference.
What Optional means is that there may or may not be a result. The isPresent method of Optional will indicate this.
Your findById by definition should always return 1 or 0 entities(according to documentation for spring data method naming), as your id is a unique key, and there cannot be more then one entry in your repository with such key value. So Optional suits perfectly well for this situation, because its either empty(no entry with such key in repository) or present with specific value(there is entry in repository). If you want to query all entities by some not unique key, lets say name column, you can name your method findByName, with return value of Iterable<Entity>, thus when generating implementation for your repository spring will understand that there can be more than 1 entity in result set.
Method findById is already predefined in interface you are extending, so you couldn't change it return type anyway.
This also might be usefull: https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#repositories.core-concepts
I have a performance problem with a hibernate implementation that is far to performance costly.
I will try to explain my current implementation which must be improved upon with pseudo classes.
Let’s say I have the following POJO classes (the Entity classes are hibernate annotated "copies").
Country.java and CountryEntity.java
City.javaand CityEntity.java
Inhabitant.javaand InhabitantEntity.java
And I want to add a city to a country and save/persist it in the database, the new city arrives fully populated as a POJO.
Current code
CountryEntity countryEntity = CountryDao.fetch(someId);
Country country = CountryConverter(countryEnity);
country.getCities.add(newCity);
countryEnity = CountryEntityConverter(country);
CountryDao.save(countryEnity);
This results in a major performance problem. Let's say I have 200 cities with 10,000 inhabitants.
For me to add a new city the converter will convert 200 x 10,000 = 2,000,000 inhabitantEntity --> inhabitant --> inhabitantEntity
This puts a tremendous load on the server, as new cities are added often.
It also feels unnecessary to convert all cities in the country just to persist and connect another one.
I am thinking of creating a light converter which doesn't convert all the fields and only the ones I need for some business logic during the addition of the city, but those will be kept unchanged, I don't know if Hibernate is good enough to handle this scenario.
For example if I save an entity with alot of null fields and the list cities with only one city, can I tell hibernate to merge this together with the db.
Or is there a different approace I can take to solve the performance problem but keeping the POJO and Entitys separate?
Some code below showing my current "slow" implementation code.
Country.Java (pseudo code)
private fields
private List<City> cities;
City.Java (pseudo code)
private fields
private List<Inhabitant> inhabitants;
Inhabitant.Java (pseudo code)
private fields
Currently I fetch a CountryEnity thru a Dao java class.
Then I have converter classes (Entities --> POJO) that sets all fields and initiate all lists.
I also have similar converter classes converting (POJO --> Entities).
CountryConverter(countryEntity)
Country country = new Country();
Country.setField(countryEntity.getField())
Loop thru cityEnitites
Country.getCities.add(CityConverter(cityEntity))
return country
CityConverter(cityEntity)
City city = new City()
city.setField(cityEntity.getField())
Loop thru inhabitantEnitites
city.getInhabitants.add(InhabitantConverter(inhabitantEntity))
return country
InhabitantConverter(inhabitantEntity)
Inhabitant inhabitant = new Inhabitant()
inhabitant.setField(inhabitantEntity.getField())
return inhabitant
Thanks in advance /Farmor
I suspect what might be happening is that you don't have an index column on the association, so Hibernate is deleting and then inserting the child collection, as opposed to just adding to or deleting discrete objects to and from the child association.
If that is what's going on, you could try adding an #IndexColumn annotation to the get method for the child association. That will then allow Hibernate to perform discrete inserts, updates, and deletes on association records, as opposed to having to delete and then re-insert. You would then be able to insert the new city and its new inhabitants without having to rebuild everything.
I have a list of objects and each and every object in the list have a position which may not change unless explicitly changed, in other words, the objects are in a queue. What collection should I use in my entity class to store these objects and how should that collection be annotated?
I currently have this
#Entity
class Foo {
...
#OneToMany(mappedBy = "foo", cascade = CascadeType.ALL)
List<Bar> bars = new ArrayList<Bar>();
...
}
If this is not possible with JPA purely, I'm using EclipseLink as my JPA provider, so EclipseLink specific annotations will ok if nothing else helps.
EDIT: Note people, the problem is not that Java wouldn't preserv the order, I know most collections do, the problem is that I don't know a smart way for JPA to preserv the order. Having an order id and making the query order by it is possible, but in my case maintaining the order id is laborious (because the user interface allows reordering of items) and I'm looking for a smarter way to do it.
If you want this to be ordered after round-tripping to SQL, you should provide some sort of ordering ID within the entity itself - SQL is naturally set-oriented rather than list-oriented. You can then either sort after you fetch them back, or make sure your query specifies the ordering too.
If you give the entity an auto-generated integer ID this might work, but I wouldn't like to guarantee it.
Use a sort order id, as Jon suggested, then add an #OrderBy annotation below the #OneToMany. This will order any query by the specified field.
As an example, if you add a new field called "sortId" to Bar, Foo would look like this:
#Entity
class Foo {
...
#OneToMany(mappedBy = "foo", cascade = CascadeType.ALL)
#OrderBy("sortId ASC")
List bars = new ArrayList();
...
}
You can
Sort a List before creation
Sort a List after creation
Use a collection that performs a sort on insert. TreeMap, TreeSet
A linked list implements the Queue inteface in java and allows you to add things in the middle...
TBH most of the collections are ordered aren't they...
Check the docs, most say whether or not they are ordered.
It's worth trying LinkedList instead of ArrayList, however, as Jon said, you need to find a way of persisting the order information.
A solution will probably involve issuing an order number to each entry an storing it as a SortedMap, when converting into the List, if List is that you need.
However, ORM could potentially be clever enough to do all the conversions for you if you stored the collection as LinkedList.