Spring Data Neo4j memory consumption on "supernode" entities

Spring Data Neo4j memory consumption on "supernode" entities - java

As far as I know, once a NodeEntity in Spring Data Neo4j is loaded, the default behaviour is to lazily load its relations by fetching only ids of related nodes.
While it seems quite ok in most situation, I have doubts about it in the case of so called "supernodes" - the nodes that have numerous relations to other nodes. That kind of nodes, even if small by themselves, will hold a huge collection of ids, using more memory than we would like it to use, and possibly being not "lazily loaded enough" in effect...
So my question is - how shall I deal with that kind of supernode?
My first idea is to simply remove all #RelatedTo/#RelatedToVia mappings (or at least the ones with relation types that are "numerous") from that kind of nodes and simply bypass SDN when operations on those relations are needed, and use SDN in other cases.
Does it seem to have sense? Do you have some other suggestions or some experience in that kind of situations?

I have not worked with SDN but I will give a try to the approximation of metanodes. With this approximation you build a structure that split the total number of relations into the number of metanodes (if a node has 1000 connections and you use 10 metanodes, each metanode will have 100 connection while the supernode just 4. You can see a graphic representation in the folowing image: http://i.stack.imgur.com/DMQGs.png.
In this way you can have a good control of how many relations can have a node and therefore how many node will be maximal loaded by SDN.
You can read more about it on http://neo4j.com/book-learning-neo4j/ and also in this similar post Neo4j how to avoid supernodes

For supernodes I'd just not specify the relationship on the supernode entity. But only on the related nodes.
And if you're interested in the relationship you either lookup the related node and follow to the supernode.
Or if you really need to load the millions of relationships, use a cypher statement.
You can also put the many relationships on a separate node for that purpose or add a tree-like-substructure which also allows to deal with subselections.

First, can you provide the version of SDN you are using so we can target the question to the right maintainers of the library.
Secondly, while I don't know really the internals of SDN but have worked heavily with other OGMs, my understanding of LazyLoading is quite different that the one you provide, for the simple reason that lazy loading the ids can be very harmful in the sense that you can have corrupted data if another process is deleting one of the nodes having one of these ids.
Generally, and it is quite common in other OGMs, in the case of an object has no annotations representing relationships, you would just recreate the object from his metadata and the loaded node.
However if it has relationships, you would then create a proxy of that object that will extend the entity itself.
The entity values on the proxy will not be instantiated in the first instance, you would then override all getters and add in the proxy the methods for retrieving the related nodes (so the Entity manager would be injected in the proxy).
So basically, a proxy will be empty until you call one of the getters on it.
You can also "fine-grain" this behavior by creating Custom repositories that extend the default one, in the sense you can choose to only LAZY_LOAD one type of relationships and EAGER_LOAD the others.
The method described by albert makes lot of sense in some cases, however it is hard to accomplish on the basic OGM side, you would better have a BehaviorComponent that will handle this for you during lifecycle events, or add some kind of pagination to the getter method, which I think is not part of the OGM right now.

Related

May planning entities be added or removed automatically by Optaplanner?

I am new to Optaplanner. I thought I had understood what planning entities are, as well as planning variables, genuine or some inverse-kind shadow ones. I have started studying the documentation, the examples and old StackOverflow's questions, but some doubts remain.
When trying to make incremental my score calculator, I have found some unexpected methods in the IncrementalScoreCalculator interface. Together with beforeVariableChanged and afterVariableChanged, I find *EntityAdded and *EntityRemoved, which make me suspect that entity objects may be added and removed. Moreover, these methods are implemented in the NQueens documented example, but in the kind of examples I looking at, examples of distributing shifts, resources, time slots, etc., I find that the domain is designed in such a way that planning entities are expected to be modified, but not added or removed.
I don't know if the addition/removal of entity objects is something used somewhere, as in route planning problems which I haven't dove into, and if these additions and removals are explicit or implicit. So, might planning entities be added or removed by Optaplanner without being asked to?

No, OptaPlanner out-of-the-box will not add or remove planning entity instances, because the default move selectors only modify planning entities, they don't create or destroy them.
OptaPlanner doesn't have any generic move selectors yet that can do that (and once we do, they won't be on by default).
If you write a custom move (see MoveListFactory and MoveIteratorFactory in docs), then you could choose to add/remove entities in moves, which is why those methods exists, but very few users do that.

JPA, complex object graphs and Serialisation

I have a "Project" entity/class that includes a number of "complex" fields, eg referenced as interfaces with many various possible implementations. To give an example: an interface Property, with T virtually of any type (as many types as I have implemented).
I use JPA. For those fields I have had no choice but to actually serialize them to store them. Although I have no need to use those objects in my queries, this is obviously leading to some issues, eg maintenance/updates to start with.
I have two questions:
1) is there a "trick" I could consider to keep my database up to date in case I have a "breaking" change in my serialised class (most of the time serialisation changes are handled well)?
2) will moving to JDO help at all? I very little experience with JDO but my understanding is that with JDO, having serialised objects in the tables will never happen (how are changes handled though?).
In support to 2) I must also add that the object graphs I have can be quite complex, possibly involving 10s of tables just to retrieve a full "Project" for instance.

JDO obviously supports persistence of interface fields (but then DataNucleus JPA also allows their persistence, but as vendor extension). Having some interface field being one of any possible type presents problems to RDBMS rather than to JDO as such. The underlying datastore is more your problem (in not being able to adequately mirror your model), and one of the many many other datastores could help you with that. For example DataNucleus JDO/JPA supports GAE/Datastore, Neo4j, MongoDB, HBase, ODF, Excel, etc and simply persists the "id" of the related object in a "column" (or equivalent) in the owning object representation ... so such breaking changes would be much much less than what you have now

Custom hibernate entity persister

I am in the process of performance testing/optimizing a project that maps
a document <--> Java object tree <--> mysql database
The document, Java classes, database schema and logic for mapping is orchestrated with HyperJaxb3. The ORM piece of it is JPA provided by hibernate.
There are about 50 different entities and obviously lots of relationships between them. A major feature of the application is to load the documents and then reorganize the data into new documents; all the pieces of each incoming document eventually gets sent out in one outgoing document. While I would prefer to not be living in the relational world, the transactional semantics are a very good fit for this application - there is a lot of money and government regulation involved, so we need to make sure everything gets delivered exactly once.
Functionally, everything is going well and performance is decent (after a fair amount of tweaking). Each document is made up of a few thousand entities which end up creating a few thousand rows in the database. The documents vary in size, and insert performance is pretty much proportional to the number of rows that need to be inserted (no surprise there).
I see the potential for a significant optimization, and this is where my question lies.
Each document is mapped to a tree of entities. The "leaf" half of the tree contains lots of detailed information that is not used in the decisions for how to generate the outgoing documents. In other words, I don't need to be able to query/filter by the contents of many of the tables.
I would like to map the appropriate entity sub-trees to blobs, and thus save the overhead of inserting/updating/indexing the majority of the rows I am currently handling the usual way.
It seems that my best bet is to implement a custom EntityPersister and associate it with the appropriate entities. Is this the right way to go? The hibernate docs are not bad, but it is a fairly complex class that needs to be implemented and I am left with lots of questions after looking at the javadoc. Can you point me to a concrete, yet simple example that I can use as a starting point?
Any thoughts about another way to approach this optimization?

I've run in to the same problem with storing large amounts of binary data. The solution I found worked best is a denormalization of the Object model. For example, I create a master record, and then I create a second object that holds the binary data. On the master, use the #OneToOne mapping to the secondary object, but mark the association as lazy. Now the data will only be loaded if you need it.
The one thing that might slow you down is the outer join that hibernate performs with all objects of this type. To avoid it, you can mark the object as mandatory. But if the database doesn't give you a huge performance hit, I suggest you leave it alone. I found that Hibernate has a tendency to load the binary data immediately if I tried to get a regular join.
Finally, if you need to retrieve a lot of the binary data in a single SQL call, use the HQL fetch join command. For example: from Article a fetch join a.data where a.data is the one-to-one relationship to the binary holder. The HQL compiler will see this as an instruction to get all the data in a single sql call.
HTH

Flex Blaze DS and JPA - lazy-loading issues

I am developing an application in Flex, using Blaze DS to communicate with a Java back-end, which provides persistence via JPA (Eclipse Link).
I am encountering issues when passing JPA entities to Flex via Blaze DS. Blaze DS uses reflection to convert the JPA entity into an ObjectProxy (effectively a HashMap) by calling all getter methods on the entity. This includes any lazy-initialised one/many-to-many relationships.
You can probably see where I am going. If I pass a single object through JPA this will call all one/many-to-many methods on this object. For each returned object if they have one/many-to-many relationships they will be called too. As such, by passing back a single JPA entity I actually end up doing multiple database calls and passing all related entries back as a single ObjectProxy instance!
My solution to date is to create a translator to convert each entity to an ObjectProxy and vice-versa. This is clearly cumbersome and there must be a better way.
Thoughts please?

As an alternative, you could consider using GraniteDS instead of BlazeDS: GraniteDS has a much more powerful data management stack than BlazeDS (it competes more with LCDS) and fully support lazy-loading for all major JPA engines: Hibernate, EclipseLink, OpenJPA, etc.
Moreover, GraniteDS has a great client-side transparent lazy loading feature and even a so-called reverse lazy-loading mechanism.
And you don't need any kind of intermediate DTOs: it serializes JPA entities as is and uses code-generated ActionScript beans on the client-side to keep their initialization states.

Unfortunately, lazy-loading is not easy to accomplish with Flash clients. There are some working solutions, like dpHibernate, but so far all the different solutions I have tested fall short of what you would expect in terms of performance and ease of use.
So in my experience, it is the best and most reliable solution to always use DTOs, which adds the benefit of cleanly separating the database and view layers. This necessitates, though, that you implement either eager loading, or a second server round trip to resolve your many-to-many relations, as well as a good deal more boilerplate code to copy the DAO and DTO field values.
Which one to choose depends on your use case: Sometimes getting only the main object's fields might be enough, then you could simply omit the List of related objects from your DTO (transfer only those values you need for your query). Sometimes you may actually need the entire list of related entities, and then you could get it via eager loading, or by setting up a second remote object to find only the list.

EclipseLink also provides a copyObject() API that allows you to give a copy group of exactly what attribute you want. You could then use this copy to avoid having the relationships that you do not want.
If you have a detached object, you could just null out the fields that you do not want as well, or use a DTO.

Saving tree-structures in Databases

I use Hibernate/Spring and a MySQL Database for my data management.
Currently I display a tree-structure in a JTable. A tree can have several branches, in turn a branch can have several branches (up to nine levels) again, or having leaves. Lately I have performanceproblemes, as soon as I want to create new branches on deeper levels.
At this time a branch has a foreign key to its parent. The domainobject has access to its parent by calling getParent(), which returns the parent-branch. The deeper the level, the longer it takes to create a new branch.
Microbenchmark results for creating a new branch are like:
Level 1: 32 ms.
Level 3: 80 ms.
Level 9: 232 ms.
Obviously the level (which means the number of parents) is responsible for this. So I wanted to ask, if there are any appendages to work around this kind of problem. I don’t understand why Hibernate needs to know about the whole object tree (all parents until the root) while creating a new branch. But as far as I know this can be the only reason for the delay while creating a new branch, because a branch doesn’t have any other relations to any other objects.
I would be very thankful for any workarounds or suggestions.
greets,
ymene

Basically you are having some sort of many to one relationships structure right?
In hibernate all depends on mapping. Tweak your mapping, Use One-to-many relationship from parent to child using java.util.Set.
Do not use ArrayList becasue List is ordered, so hibernate will add extra column for that ordering only.
Also check your lazy property. If you load parent and you have set lazy="false" on its child set property, then all of its children will be loaded from DB which can affect the performance.
Also check 'inverse' property for children. If inverse is true in child table, that means you can manage the child entity separately. Otherwise you have to do that using the parent only.
google around for inverse, it will sure help you.
thank.

I don't know how Hibernate handles this internally. However, there are different ways to store tree structures in a database. One which is quite efficient for many queries done on the tree is using a "nested set" approach - but this would basically yield the performance issues that you're seeing (e.g. expensive insertion). If you need fast insertion or removal I'd go with what you have, e.g. a simple parent-ID, and try to see what Hibernate is doing all this time.

If you don't need to report on your data in SQL, you could just serialize your JTable to the database instead (perhaps using something like XStream). That way you wouldn't have to worry about expensive database queries that deal with trees.

One thing you can do is use the XML support in MySQL. This will give you native ability to support hierarchies. I've never used XML support in MySQL, so I don't know if it is as full-featured as other DBMSes (SQL Server and DB2 I know have great support, probably Oracle too I would guess).
Note, that I have never used hibernate, so I don't know if you could interface with that, or if you would have to write your own DB code in this case (my guess is, you're going to be writing your own queries).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.