cypher indexing common property on all nodes - java

As per new indexing rules, the auto_index will go away in future and its expected to create indexes using cypher. According to this new way, to index a node property, you MUST provide a Node Label.
I have a 'nodeId' property present on all types of Node Labels - User, Employee, Bank, Car, etc. I used to auto-index this property to retrieve any type of node if its nodeId is known. Please note that since auto-index did not require me to give a Node Label, it was possible for me to do what I did.
ReadableIndex<Node> readableIndex = this.graphDatabaseService.index().getNodeAutoIndexer().getAutoIndex();
readableIndex.get("nodeId", "0").getSingle();
But with new style, I have to create index on nodeId property for each and every Node Label. So I have to do this:
create index on :User(nodeId)
create index on :Employee(nodeId)
...
Moreover, my method getByNodeId(String nodeId) is useless now because this cypher query IMHO will not be able to use the index anymore since I am not passing any node label.
match (node) where node.nodeId = {nodeId} return node;
Since the whole point of my getByNodeId() method was to be generic across all nodes, I cannot give this cypher query a node label. So what should I do here. My 2 questions are:
How do I tell neo4j via cypher to index on all node labels
How do I write a cypher query which uses index not based on node label, but based on node property.
Note:
It is essential for me to use cypher because I am using neo4j-jdbc
and they have no method to create auto-index or access the
auto-indexer (atleast not that I know of).
Some might suggest me to change the neo4j.properties to enable
auto-indexing there, but I dont like changing configuration files. I
want to do it in my program. Anyway, that would have only solved the
first issue. Second issue is still there.

A node can have multiple labels.
Thus, if you make all your nodes share a common label, say Base (in addition to whatever labels they currently have), you can just have a single index that covers all your nodes:
CREATE INDEX ON :Base(nodeId)

Related

Cloning references?

This may be a too-specific question, but I'm hoping there's a more general solution to my problem.
I have a class. In this class is a tree-type structure with many parent/child nodes. Also in this class is an Array filled with references to each node in this tree-type structure.
The tree's purpose is for each node to know where to draw itself on screen (every node has relative positional information based on its parent's location).
The Array's purpose is the draw order, I simply draw whatever node is referenced at Array[0] first and so on. (So, the nodes aren't being drawn in the order they appear in the tree necessarily).
My problem is this. I want to clone this overall class that contains these two objects (tree with nodes and an Array that references said nodes). This seems simple.
I create a deep copy of the tree structure and the nodes it contains. Cool.
However, I don't know how to repopulate a new Array with references to these new nodes in this new tree. It seems like it would be simple but I can't figure out how to do it.
I tried to be specific enough to give enough information without being too confusing, hope you understand.
Thanks.
If you're able to change the node data structure, you could add a field for the node's array index. That way, once you've rebuilt your tree you can just walk through it and use the index field to repopulate the array. Not super elegant, but it gets the job done...
Or, to avoid adding a field to your node class, I suppose you could use a temporary hashtable that maps nodes to array indices. Walk through your source array to populate the hashtable, then once you've cloned the tree, walk through the tree, looking up the new nodes in the hashtable (which will work fine assuming you've implemented equals and hashCode properly) and populating the array from those.

How to create a simple unordered tree(not BST) in java with given node pairs(u,v)?

Problem is I don't understand how to create a tree. I have gone through many code examples on trees but I don't even know how to work with/handle a node and hence I don't understand how the node class works(that was present in all the program examples ). When i try to use methods such as appendChild(as mentioned in java docs),I get an error,and I am asked to create one such appendChild method inside that node class within the main program. Couldn't understand why that happened.
I am given integer pairs((u,v) meaning there is an edge between u & v) of nodes and I also need to know if any Element-to-node conversion is required for using u and v(of type integer) as nodes.
Please bear with me since my basics are weak. Little explanation on how the entire thing works/functions would be very helpful.
Thank you.
EDIT 1: I went through the following links:(hardly found anything on just unordered trees) http://www.cs.cmu.edu/~adamchik/15-121/lectures/Trees/code/BST.java http://www.newthinktank.com/2013/03/binary-tree-in-java/ . Tried to modify these codes to meet my own purpose but failed.
I only got a blurry idea and that is not enough for implementation. I am trying to make a simple unordered tree,for which i am given u v pairs like:
(4,5) (5,7) (5,6). I just need to join (4<--5),(5<--7) and (5<--6). So how do I write a node class that only joins one node to the prev node? Besides,to do only this,do I need to bother myself with leftchild,rightchild? If not,how will I be able to traverse the tree and do similar operations such as height diameter calculation etc later?
Thank you for your patience.
Well Its not entirely clear whether you want an explanation on tree creation in general, on some tree implementation you have found, or you have a basic code already that you cannot get working. You might want to clarify that :).
Also tree creation in general:
Most explanation and implementation you will find might be "overly" elegant :). So try to imagine a simple linked list first. In that list you have nodes(the elements of the list). A node contains some valuable data and a reference to an other node object. A very simple tree is different only in that a node have more than one reference to other nodes. For example it has a Set of references, its "children".
Here is a VERY crude code only as an example for addChild() or appendChild in your case:
public class Node {
private String valueableData;
private Set<Node> children;
public Node(){
this.children=new HashSet<Node>();
}
public Node (String valueableData){
this.valueableData=valueableData;
this.children=new HashSet<Node>();
}
public void addChild(Node node){
children.add(node);
}
}
Now this implementation would be quite horrible (I even deleted the setter/getters), also it would be wise to keep the root nodes reference in some cases, and so on. But you might get the basic idea.
You might wanna create a cycle or recursion to go over the (u,v) integer pairs. You create the root node first then you just addChild() all the other nodes recursively, or create every node first then setChild() them according to your rules.

Creating a Graph Query Language (Node/Edge/HyperEdge)

I'm creating an API that encapsulates JPA objects with additional properties and helpers. I do not want the users to access the database, because I have to provide certain querying functionality for the consumers of the API.
I have the following:
Node1(w/ attributes) -- > Edge1(w/ attr.) -- > Node2(w/ attr.)
and
Node1(w/ attributes) -- > |
Node2(w/ attributes) -- > | -- > HyperEdge1(w/ attr.)
Node3(w/ attributes) -- > |
Basically a Node can be of a certain type, which would dictate the kind of attributes available. So I need to be able to query these "paths" depending on different types and attributes.
For example: Start from a Node, and find a path typeA > typeB & attr1 > typeC.
So I need to do something simple, and be able to write the query as a string, or maybe a builder pattern style.
What I have so far, is a visitor pattern set up to traverse the Nodes/Edges/HyperEdges, and this allows for a sort of querying, but it's not very simple, since you have to create a new visitor for new types of queries.
This is my implementation so far:
ConditionImpl hasMass = ConditionFactory.createHasMass( 2.5 );
ConditionImpl noAttributes = ConditionFactory.createNoAttributes();
List<ConditionImpl> conditions = new ArrayList<ConditionImpl>();
conditions.add( hasMass );
conditions.add( noAttributes );
ConditionVisitor conditionVisitor = new ConditionVisitor( conditions );
node.accept( conditionVisitor );
List<Set<Node>> validPaths = conditionVisitor.getValidPaths();
The code above, does a query that checks if the starting node has a mass of 2.5 and a linked node (child) has no attributes. The visitor does a condition.check( Node ) and returns a boolean.
Where do I start with creating a querying language for a graph that is simpler?
Note: I do not have the option of using an existing graph library and I will have hundreds of thousands of nodes, plus the edges..
Personally, I like the idea of the visitor pattern, however it might turn out to expensive to visit all nodes.
Query Interface: If users / other developers are using it, I would use a builder style interface, with readable method names:
Visitor v = QueryBuilder
.selectNodes(ConditionFactory.hasMass(2.5))
.withChildren(ConditionFactory.noAttributes())
.buildVisitor();
node.accept(v);
List<Set<Node>> validPaths = v.getValidPaths();
As pointed out above, this is more or less just syntactic sugar for what you already have (but sugar makes all the difference). I would separate the code for "moving on the graph" (like "check whether visited node fulfills condition" or "check whether connected nodes fulfill condition") from the code that actually checks (or is) a condition. Also, use composites on conditions to build and/or:
// Select nodes with mass 2.5, follow edges with both conditions fulfilled and check that the children on these edges have no attributes.
Visitor v = QueryBuilder
.selectNodes(ConditionFactory.hasMass(2.5))
.withEdges(ConditionFactory.and(ConditionFactory.freestyle("att1 > 12"), ConditionFactory.freestyle("att2 > 23"))
.withChildren(ConditionFactory.noAttributes())
.buildVisitor();
(I used "freestyle" because of missing creativity right now, but the intention of it should be clear) Node that in general this might be two different interfaces in order to not build strange queries.
public interface QueryBuilder {
QuerySelector selectNodes(Condition c);
QuerySelector allNodes();
}
public interface QuerySelector {
QuerySelector withEdges(Condition c);
QuerySelector withChildren(Condition c);
QuerySelector withHyperChildren(Condition c);
// ...
QuerySelector and(QuerySelector... selectors);
QuerySelector or(QuerySelector... selectors);
Visitor buildVisitor();
}
Using this kind of syntactic sugar makes the queries readable from the source code without forcing you to implement your own data query language. The QuerySelector implementations would than be responsible for "moving" around the visited nodes whereas the Conditition implementation would check whether the condition match.
The clear downside of this approach is, that you need to foresee most of the queries in interfaces and need to implement them already.
Scalability with number of nodes: You might need to add some kind of index to speed up finding "interesting" nodes. One idea which is popping up is to add (for each index) a layer to the graph in which each nodes models one of the different attribute settings for the "indexed variable". The normal edges could then connect these index nodes with the nodes in the original graph. The hyper edges on the index could then build a network which is smaller to search on. Of course there is still the boring way of storing the index in a map-like structure with a attributeValue -> node mapping. Which probably is much more performant than the idea above anyway.
If you have some kind of Index make sure that the index can as well receive a visitor such that it does not have to visit all nodes in the graph.
It sounds like you have all the pieces except some syntactic sugar.
How about an immutable style where you create the whole list above like
Visitor v = Visitor.empty
.hasMass(2.5)
.edge()
.node()
.hasNoAttributes();
You can create any kind of linear query pattern using this style; and if you add a some extra state you could even do branching queries by e.g. setName("A") and later .node("A") to return to that point of the query.

Algorithm for finding objects with varying degrees of specificity

I have a large number of objects arranged in a tree-like structure (each node on the tree has parents and children, starting with one master node, and ending in many child nodes). Each object has it's own ID in the form of a string, and there are many duplicate IDs, but no duplicates sunder the same parent. Example:
ParentA:
childA
childB
childD
ParentB:
childA
childC
childD
The tree is also many layers deep.
I need a method of finding objects that will work like this (example is based on the previous list):
Example 1:
an ArrayList with the string {"childB"} is passed to the algorithm
there are no duplicate nodes with an ID of "childB", so a refrence to childB is returned
Example 2:
an ArrayList with the strings {"parentA", "childD"} is passed to the algorithm
there are no duplicate nodes with an ID of "childD" AND a parent with an ID of "parentA", so a reference to the given node is returned
Example 3:
an ArrayList with the string {"childD"} is passed to the algorithm
there are duplicate nodes with an ID of "childD" so the algorithm requests for more information (the name of the parent(s))
Keep in mind that there may be many levels of specificity, like {"nodeA", "nodeD", "nodeX", "nodeD"} so some kind of loop, or maybe a recursive method would be needed.
So, any ideas?
Update:
I created a depth-first-search algorithm to go through each node on the tree, and it works very well. The algorithm returns all nodes in the form of one ArrayList All I need now is a way to select one based on varying degrees of specificity. Can anyone help with that?
The above three examples show what I need.
depth-first search algorithm may be helpful for you!!!

How to build in Java a Weighted Directed Acyclic Graph

I did a search on similar topics, but the answers are too vague for my level of understanding and comprehension, and I don't think they're specific enough to my question.
Similar threads:
Tree (directed acyclic graph) implementation
Representing a DAG (directed acyclic graph)
Question:
I have formatted a text file which contains data of the following format...
Example dataset:
GO:0000109#is_a: GO:0000110#is_a: GO:0000111#is_a: GO:0000112#is_a: GO:0000113#is_a: GO:0070312#is_a: GO:0070522#is_a: GO:0070912#is_a: GO:0070913#is_a: GO:0071942#part_of: GO:0008622
GO:0000112#part_of: GO:0000442
GO:0000118#is_a: GO:0016581#is_a: GO:0034967#is_a: GO:0070210#is_a: GO:0070211#is_a: GO:0070822#is_a: GO:0070823#is_a: GO:0070824
GO:0000120#is_a: GO:0000500#is_a: GO:0005668#is_a: GO:0070860
GO:0000123#is_a: GO:0005671#is_a: GO:0043189#is_a: GO:0070461#is_a: GO:0070775#is_a: GO:0072487
GO:0000126#is_a: GO:0034732#is_a: GO:0034733
GO:0000127#part_of: GO:0034734#part_of: GO:0034735
GO:0000133#is_a: GO:0031560#is_a: GO:0031561#is_a: GO:0031562#is_a: GO:0031563#part_of: GO:0031500
GO:0000137#part_of: GO:0000136
I'm looking to construct a weighted directed DAG from this data (the above is just a snippet). The whole dataset of 106kb is here: Source
--------------------------------------------------
Taking into consideration line-by-line, the data of each line is explained as follows...
First line as an example:
GO:0000109#is_a: GO:0000110#is_a: GO:0000111#is_a: GO:0000112#is_a: GO:0000113#is_a: GO:0070312#is_a: GO:0070522#is_a: GO:0070912#is_a: GO:0070913#is_a: GO:0071942#part_of: GO:0008622
'#' is the delimeter/tokenizer for the line data.
The First term, GO:0000109 is the node name.
The subsequent terms of is_a: GO:xxxxxxx OR part_of: GO:xxxxxxx are the nodes which are connected to GO:0000109.
Some of the subsequent terms have connections too, as depicted in the dataset.
When it is is_a, the weight of the edge is 0.8.
When it is part_of, the weight of the edge is 0.6.
--------------------------------------------------
I have Google-d on how DAGs are, and I understand the concept. However, I still have no idea how to put it into code. I'm using Java.
From my understanding, a graph generally consists of nodes and arcs. Does this graph require an adjacency list to determine the direction of the connection? If so, I'm not sure how to combine the graph and adjacency list to communicate with each other.
After constructing the graph, my secondary goal is to find out the degree of each node from the root node. There is a root node in the dataset.
For illustration, I have drawn out a sample of the connection of the first line of data below:
Image Link
I hope you guys understand what I'm trying to achieve here. Thanks for looking through my problem. :)
Because it's easier to think about, I'd prefer to represent it as a tree. (Also makes it easier to traverse the map and keep intermediate degrees.)
You could have a Node class, which would have a Collection of child Node objects. If you must, you could also represent the child relationships as a Relationship object, which would have both a weight and a Node pointer, and you could store a Collection of Relationship objects.
Then you could do a walk on the tree starting from the root, and mark each visited node with its degree.
class Node{
String name;
List<Relationship> children;
}
class Relationship{
Node child;
double weight;
}
class Tree{
Node root;
}
Here, Tree should probably have a method like this:
public Node findNodeByName(String name);
And Node should probably have a method like this:
public void addChild(Node n, double weight);
Then, as you parse each line, you call Tree.findNodeByName() to find the matching node (and create one if none exists... but that shouldn't happen, if your data is good), and append the subsequent items on the line to that node.
As you've pointed out, DAGs cannot really be converted to trees, especially because some nodes have multiple parents. What you can do is insert the same node as the child of multiple parents, perhaps using a hash table to decide if a particular node has been traversed or not.
Reading the comments, you seem confused by how a Node can contain Relationships which each in turn contains a Node. This is quite a common strategy, it is in general called the Composite pattern.
The idea in the case of trees is that the tree can be thought of as consisting of multiple subtrees - if you were to disconnect a node and all its ancestors from the tree, the disconnected nodes would still make a tree, though a smaller one. Thus, a natural way to represent a tree is to have each Node contain other Nodes as children. This approach lets you do many things recursively, which in the case of trees is often, again, natural.
Letting a Node keep track of its children and no other parts of the tree also emulates the mathematical directed graph - each vertex is "aware" only of its edges and nothing else.
Example recursive tree implementation
For instance, to search for an element in a binary search tree, you would call the root's search method. The root then checks whether the sought element is equal, less or greater than itself. If it is equal, the search exits with an appropriate return value. If it is less or greater, the root would instead call search on the left or right child, respectively, and they would do exactly the same thing.
Analogously, to add a new Node to the tree, you would call the root's add method with the new node as a parameter. The root decides whether it should adopt the new node or pass it on to one of its children. In the latter case, it would select a child and call its add method with the new Node as a parameter.

Categories