I'm trying to create an algorithm in Neo4j using the java API. The algorithm is called GRAIL (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.169.1656&rep=rep1&type=pdf) and it assigns labels to a graph for later answering reachability queries.
This algorithm uses postorder depth first search but with random traversal each time (each child of a node is visited randomly in each traversal).
In the neo4j java api there is an algorithm doing this (https://github.com/neo4j/neo4j/blob/7f47df8b5d61b0437e39298cd15e840aa1bcfed8/community/kernel/src/main/java/org/neo4j/graphdb/traversal/PostorderDepthFirstSelector.java) without the randomness and i can't seem to find a way to do this.
My code has a traversal description in which i want to add a custom order (BranchOrderingPolicy) in order to achieve the before mentioned algorithm. like this:
.order(**postorderDepthFirst()**)
The answer to my question came rather easy but after a lot of thinking. I just had to alter the path expander (i created my own) which returns the relationhipss that the traversal takes as next and there a simple line of code to randomize the relationships.
The code is :
public class customExpander implements PathExpander {
private final RelationshipType relationshipType;
private final Direction direction;
private final Integer times;
public customExpander (RelationshipType relationshipType, Direction direction ,Integer times)
{
this.relationshipType = relationshipType;
this.direction = direction;
this.times=times;
}
#Override
public Iterable<Relationship> expand(Path path, BranchState state)
{
List<Relationship> results = new ArrayList<Relationship>();
for ( Relationship r : path.endNode().getRelationships( relationshipType, direction ) )
{
results.add( r );
}
Collections.shuffle(results);
}
return results;
}
#Override
public PathExpander<String> reverse()
{
return null;
}
}
There's no such ordering by default in neo4j, however it should be possible to write one. TraversalBranch#next gives the next branch and so your implementation could get all or some and pick at random. However state keeping will be slightly tricky and as memory hungry as a breadth first ordering I'd guess. Neo4j keeps relationships in linked lists per node, so there's no easy way to pick one at random without first gathering all of them.
Related
I need to write a piece of code using the Kruskal algorithm, which in turn needs the Union-Find algorithm.
This includes the methods Make-Set(x), Find-Set(x) and Union(x, y).
I need to implement them using linked lists, but I am not sure of how to start with the Make-Set method.
The Make-Set Method should create a set and make the first element into a key (to compare sets). How exactly would I declare a key using linked lists?
Shortly put: How do I implement this pseudo code for linked lists in Java?
Make-Set(x)
x.p = x
x.rank = 0
Thanks for your help in advance!
I've heard this referred to in the past not as "Union-Find" but as a disjoint set. It isn't exactly a linked list, since the nodes do have a link, but they aren't necessarily linked up in a linear fashion. It's more like a tree where each node has a pointer to its parent and you can walk up the tree to the root.
I don't have much time right now, but here's a quick sketch of how I would implement it in Java:
class Disjoint {
Disjoint next;
Disjoint findSet() {
Disjoint head = this;
if (next != null) {
head = next.findSet();
next = head;
}
return head;
}
void union(Disjoint other) {
Disjoint us = this.findSet();
Disjoint them = other.findSet();
us.next = them;
}
}
Creating an instance is your Make-Set. What you call Find-Set I would call find head or find leader, maybe find identity. I've called it findSet here, though. It walks the chain to find the root of the tree. It also performs an optional operation; it snaps all the links on the way back out of the recursive call so that they all point directly at the root. This is an optimization to keep the chains short.
Finally, Union is implemented just by assigning one root's next pointer to point at the other set. I'm not sure what you intended with rank; if it's the size of the set, you can add a field for that and simply sum them when you union two sets. But you initialize it to 0 for a new set when I would expect it to be initialized to 1.
Two nodes a and b belong to the same set if a.findSet() == b.findSet(). If you need the nodes to carry some data, make the class generic and provide the data to the constructor, and add a getter:
class Disjoint<T> {
Disjoint<T> next;
T data;
public Disjoint(final T data) {
this.data = data;
}
public T getData() {
return data;
}
// rest of class identical except Disjoint replaced with Disjoint<T> everywhere
}
Now I know there have been previous questions regarding this algorithm, however I honestly haven't come across a simple java implementation. Many people have copied and pasted the same code in their GitHub profiles, and its irritating me.
So for the purpose of interview exercise, I've planned to set out and implement the algorithm using a different approach.
The algorithm tend out to be very very challenging. I honestly am lost on how to go about it. The logic just doesn't make sense. I've nearly spent 4 days straight sketching the approach, but to no avail.
Therefore please enlighten us with your wisdom.
I'm primarily doing the algorithm based on this information Intuition behind the Aho-Corasick string matching algorithm
It would be a big bonus if one can implement their own solution here.
But here's the following incomplete and not working solution which I got really stuck at:
If your overwhelemed with the code, the main problem lies at the main algorithm of Aho-Corasick. We already have created the trie tree of dictionaries well.
But the issue is, that now that we have the trie strcuture, how do we actually start implementing.
None of the resources were helpful.
public class DeterminingDNAHealth {
private Trie tree;
private String[] dictionary;
private Node FailedNode;
private DeterminingDNAHealth() {
}
private void buildMatchingMachine(String[] dictionary) {
this.tree = new Trie();
this.dictionary = dictionary;
Arrays.stream(dictionary).forEach(tree::insert);
}
private void searchWords(String word, String[] dictionary) {
buildMatchingMachine(dictionary);
HashMap < Character, Node > children = tree.parent.getChildren();
String matchedString = "";
for (int i = 0; i < 3; i++) {
char C = word.charAt(i);
matchedString += C;
matchedChar(C, matchedString);
}
}
private void matchedChar(char C, String matchedString) {
if (tree.parent.getChildren().containsKey(C) && dictionaryContains(matchedString)) {
tree.parent = tree.parent.getChildren().get(C);
} else {
char suffix = matchedString.charAt(matchedString.length() - 2);
if (!tree.parent.getParent().getChildren().containsKey(suffix)) {
tree.parent = tree.parent.getParent();
}
}
}
private boolean dictionaryContains(String word) {
return Arrays.asList(dictionary).contains(word);
}
public static void main(String[] args) {
DeterminingDNAHealth DNA = new DeterminingDNAHealth();
DNA.searchWords("abccab", new String[] {
"a",
"ab",
"bc",
"bca",
"c",
"caa"
});
}
}
I have setup a trie data structure which works fine. So no problem here
trie.java
public class Trie {
public Node parent;
public Node fall;
public Trie() {
parent = new Node('⍜');
parent.setParent(new Node());
}
public void insert(String word) {...}
private boolean delete(String word) {...}
public boolean search(String word) {...}
public Node searchNode(String word) {...}
public void printLevelOrderDFS(Node root) {...}
public static void printLevel(Node node, int level) {...}
public static int maxHeight(Node root) {...}
public void printTrie() {...}
}
Same thing for Node.
Node.java
public class Node {
private char character;
private Node parent;
private HashMap<Character, Node> children = new HashMap<Character, Node>();
private boolean leaf;
// default case
public Node() {}
// constructor accepting the character
public Node(char character) {
this.character = character;
}
public void setCharacter(char character) {...}
public char getCharacter() {...}
public void setParent(Node parent) {...}
public Node getParent() {...}
public HashMap<Character, Node> getChildren() {...}
public void setChildren(HashMap<Character, Node> children) {...}
public void resetChildren() {...}
public boolean isLeaf() {...}
public void setLeaf(boolean leaf) {...}
}
I usually teach a course on advanced data structures every other year, and we cover Aho-Corasick automata when exploring string data structures. There are slides available here that show how to develop the algorithm by optimizing several slower ones.
Generally speaking, I’d break the implementation down into four steps:
Build the trie. At its core, an Aho-Corasick automaton is a trie with some extra arrows tossed in. The first step in the algorithm is to construct this trie, and the good news is that this proceeds just like a regular trie construction. In fact, I’d recommend just implementing this step by pretending you’re just making a trie and without doing anything to anticipate the later steps.
Add suffix (failure) links. This step in the algorithm adds in the important failure links, which the matcher uses whenever it encounters a character that it can’t use to follow a trie edge. The best explanation I have for how these work is in the linked lecture slides. This step of the algorithm is implemented as a breadth-first search walk over the trie. Before you code this one up, I’d recommend working through a few examples by hand to make sure you get the general pattern. Once you do, this isn’t particularly tricky to code up. However, trying to code this up without fully getting how it works is going to make debugging a nightmare!
Add output links. This step is where you add in the links that are used to report all the strings that match at a given node in the trie. It’s implemsnted through a second breadth-first search over the trie, and again, the best explanation I have for how it works is in the slides. The good news is that this step is actually a lot easier to implement than suffix link construction, both because you’ll be more familiar with how to do the BFS and how to walk down and up the trie. Again, don’t attempt to code this up unless you can comfortably do this by hand! You don’t need min code, but you don’t want to get stuck debugging code whose high-level behavior you don’t understand.
Implement the matcher. This step isn’t too bad! You just walk down the trie reading characters from the input, outputting all matches at each step and using the failure links whenever you get stuck and can’t advance downward.
I hope this gives you a more modular task breakdown and a reference about how the whole process is supposed to work. Good luck!
You're not going to get a good understanding of the Aho-Corasick string matching algorithm by reading some source code. And you won't find a trivial implementation because the algorithm is non-trivial.
The original paper, Efficient String Matching: An Aid to Bibliographic Search, is well written and quite approachable. I suggest you download that PDF, read it carefully, think about it a bit, and read it again. Study the paper.
You might also find it useful to read others' descriptions of the algorithm. There are many, many pages with text descriptions, diagrams, Powerpoint slides, etc.
You probably want to spend at least a day or two studying those resources before you try to implement it. Because if you try to implement it without fully understanding how it works, you're going to be lost, and your implementation will show it. The algorithm isn't exactly simple, but it's quite approachable.
If you just want some code, there's a good implementation here: https://codereview.stackexchange.com/questions/115624/aho-corasick-for-multiple-exact-string-matching-in-java.
Given this interface...
public static interface Node
{
int getValue();
List<Node> getChildren();
}
Implement the following method to return the average of all node values in the tree.
public static double getAverage(Node root)
{
}
I'm having an extremely hard time with this practice problem and have a few questions.
I assume this is best completed using recursion, is that correct?
Is this possible without a helper method or global variables?
Is this possible without having to traverse the tree twice? (Once for node sum, once for node count)
Additionally, if someone could provide some psuedo-code, I'd greatly appreciate it.
You can use recursion, but it is possible to solve this problem without it, too. What you need is just a depth-first search. You can implement it iteratively using a stack.
Yes, it is possible. A version with a stack does not require any additional methods.
Yes, it is possible. You can just compute both of these values during one traversal.
Here is a pseudo code of a non-recursive implementation:
valuesSum = 0
nodesCount = 0
stack = new Stack
stack.add(root)
while (!stack.isEmpty())
Node v = stack.poll()
nodesCount++
valuesSum += v.getValue()
for (child : v.getChildren())
stack.add(child)
return valuesSum / nodesCount
I've an ArrayList which contains my nodes. A node has a source, target and costs. Now I have to iterate over the whole ArrayList. That lasts for for over 1000 nodes a while. Therefore I tried to sort my List by source. But to find the corresponding pair in the List I tried the binary search. Unfortunately that works only if I want to compare either source or target. But I have to compare both to get the right pair. Is there another possibility to search an ArrayList efficient?
Unfortunately, no. ArrayLists are not made to be efficiently searched. They are used to store data and not search it. If you want to merely know if an item is contained, I would suggest you use HashSet as the lookup will have a time complexitiy of O(1) instead of O(n) for the ArrayList (assuming that you have implemented a functioning equals method for your objects).
If you want to do fast searches for objects, I recommend using an implementation of Dictionnary like HashMap. If you can afford the space requirement, you can have multiple maps, each with different keys to have a fast lookup of your object no matter what key you have to search for. Keep in mind that the lookup also requires implementing a correct equals method. Unfortunately, this requires that each key be unique which may not be a brilliant idea in your case.
However, you can use a HashMapto store, for each source, a List of nodes that have the keyed source as a source. You can do the same for cost and target. That way you can reduce the number of nodes you need to iterate over substantially. This should prove to be a good solution with a scarcely connected network.
private HashMap<Source, ArrayList<Node>> sourceMap = new HashMap<Source, ArrayList<Node>>();
private HashMap<Target, ArrayList<Node>> targetMap = new HashMap<Target, ArrayList<Node>>();
private HashMap<Cost, ArrayList<Node>> costMap = new HashMap<Cost, ArrayList<Node>>();
/** Look for a node with a given source */
for( Node node : sourceMap.get(keySource) )
{
/** Test the node for equality with a given node. Equals method below */
if(node.equals(nodeYouAreLookingFor) { return node; }
}
In order to be sure that your code will work, be sure to overwrite the equals method. I know I have said so already but this is a very common mistake.
#Override
public boolean equals(Object object)
{
if(object instanceof Node)
{
Node node = (Node) object;
if(source.equals(node.getSource() && target.equals(node.getTarget()))
{
return true;
}
} else {
return false;
}
}
If you don't, the test will simply compare references which may or may not be equal depending on how you handle your objects.
Edit: Just read what you base your equality upon. The equals method should be implemented in your node class. However, for it to work, you need to implement and override the equals method for the source and target too. That is, if they are objects. Be watchful though, if they are Nodes too, this may result in quite some tests spanning all of the network.
Update: Added code to reflect the purpose of the code in the comments.
ArrayList<Node> matchingNodes = sourceMap.get(desiredSourde).retainAll(targetMap.get(desiredTarget));
Now you have a list of all nodes that match the source and target criteria. Provided that you are willing to sacrifice a bit of memory, the lookup above will have a complexity of O(|sourceMap| * (|sourceMap|+|targetMap|)) [1]. While this is superior to just a linear lookup of all nodes, O(|allNodeList|), if your network is big enough, which with 1000 nodes I think it is, you could benefit much. If your network follows a naturally occurring network, then, as Albert-László Barabási has shown, it is likely scale-free. This means that splitting your network into lists of at least source and target will likely (I have no proof for this) result in a scale-free size distribution of these lists. Therefore, I believe the complexity of looking up source and target will be substantially reduced as |sourceMap| and |targetMap| should be substantially lower than |allNodeList|.
You'll need to combine the source and target into a single comparator, e.g.
compare(T o1, T o2) {
if(o1.source < o2.source) { return -1; }
else if(o1.source > o2.source) { return 1; }
// else o1.source == o2.source
else if(o1.target < o2.target) { return -1; }
else if(o1.target > o2.target) { return 1; }
else return 0;
}
You can use the .compareTo() method to compares your nodes.
You can create two ArrayLists. The first sorted by source, the second sorted by target.
Then you can search by source or target using binarySearch on the corresponding List.
You can make a helper class to store source-target pairs:
class SourceTarget {
public final Source source; // public fields are OK when they're final and immutable.
public final Target target; // you can use getters but I'm lazy
// (don't give this object setters. Map keys should ideally be immutable)
public SourceTarget( Source s, Target t ){
source = s;
target = t;
}
#Override
public boolean equals( Object other ){
// Implement in the obvious way (only equal when both source and target are equal
}
#Override
public int hashCode(){
// Implement consistently with equals
}
}
Then store your things in a HashMap<SourceTarget, List<Node>>, with each source-target pair mapped to the list of nodes that have exactly that source-target pair.
To retrieve just use
List<Node> results = map.get( new SourceTarget( node.source, node.target ) );
Alternatively to making a helper class, you can use the comparator in Zim-Zam's answer and a TreeMap<Node,List<Node>> with a representative Node object acting as the SourceTarget pair.
I am trying to create a TraversalDescription that will perform the following search;
Return only nodes that have a certain property ("type" == "PERSON")
Return a certain fixed number of results (the entire graph is huge, we are only interested in the local graph)
Any relationship types can be used
I haven't managed to get very far, I can't seem to figure out how to create an Evaluator for node properties;
TraversalDescription td = Traversal.description().bredthFirst().evaluator(?...);
I fixed this by simply implementing the Evaluator interface, and overwriting the Evaluator.evaluate(Path p) method;
public final class MyEvaluator implements Evaluator {
private int peopleCount;
private int maxPeople;
public MyEvaluator(int max) {
maxPeople = max;
peopleCount = 0;
}
public Evaluation evaluate(Path p) {
//prune if we have found the required number already
if(peopleCount >= maxPeople) return Evaluation.EXCLUDE_AND_PRUNE;
//grab the node of interest
Node n = p.endNode();
//include if it is a person
if(n.hasProperty("type") && (n.getProperty("type").equals(NodeTypes.PERSON.name()))) {
peopleCount++;
return Evaluation.INCLUDE_AND_CONTINUE;
}
// otherwise just carry on as normal
return Evaluation.EXCLUDE_AND_CONTINUE;
}
}
And then my TraversalDescription definition ends up looking like:
TraversalDescription td = Traversal.description().breadthFirst().evaluator(new MyEvaluator(peopleRequired));
Even when coding in Java, I'd recommend starting with a Cypher query for traversals, only dropping down into TraversalDescriptions if you really want to tweak the performance or conduct some interesting operations.
From what you've described and assuming you have the id of the start node, a Cypher query could be:
start n=node(1) match (n)-[*1..2]-(m) where m.type="Person" return distinct(m) limit 2
That would find all nodes between 1 and 2 hops away from the starting node, following any relationship type, but where the nodes have a type property set to "Person", finally returning only 2 distinct results. You can try that using an example on console (to which I've added "type" properties).
To execute that from within Java, you'd create an ExecutionEngine, provide the query, then iterate over the results as described in the Neo4j Manual.