Been trying to build a method that gets all conceivable unique paths through a DAG. Went with recursion because it seemed like the easiest to understand. Ended up with this:
public class Brutus {
//the previous nodes visited
public ArrayList<Node> resultHistory = new ArrayList<Node>();
//Directed Graph class, contains a HashMap [adjacencies]
// that has keys for all nodes that contains all edges
public AdjacencyList paths;
//A list of all the pathways between nodes represented as a list of nodes
public ArrayList<ArrayList<Node>> allPaths = new ArrayList<ArrayList<Node>>();
public Brutus(AdjacencyList paths) {
this.paths = paths;
}
public ArrayList<ArrayList<Node>> findAll() {
int counter = 1;
for (Node key : paths.adjacencies.keySet()) {
System.out.println("[" + counter + "]: " + key.toString());
StringTokenizer st = new StringTokenizer(
paths.getAdjacentString(key), ",");
while (st.hasMoreTokens()) {
String child = st.nextToken();
if (paths.getNodeFromGraph(child) != null) {
resultHistory = new ArrayList<Node>();
resultHistory.add(key);
findPath(child, resultHistory);
}
}
counter++;
}
return allPaths;
}
public void findPath(String child, ArrayList<Node> resultHistory) {
if (resultHistory.contains(paths.getNodeFromGraph(child))) {
return;
}
resultHistory.add(paths.getNodeFromGraph(child));
if(!(inList(resultHistory, allPaths))) {
allPaths.add(resultHistory);
}
StringTokenizer st = new StringTokenizer(
paths.getAdjacentString(paths.getNodeFromGraph(child)), ",");
while (st.hasMoreTokens()) {
child = st.nextToken();
if (paths.getNodeFromGraph(child) != null) {
findPath(child, resultHistory);
}
}
}
public boolean inList(ArrayList<Node> resultHistory,
ArrayList<ArrayList<Node>> allPaths) {
for (int i = 0; i < allPaths.size();i++) {
if (allPaths.get(i).equals(resultHistory)) {
return true;
}
}
return false;
}
Problem is, I don't think it works for all paths, since I can't find certain paths inside it. Although as the dataset is 900 nodes, I am unable to find a pattern! Other questions on Stack seem to be somewhat more specialized and as such I attempted to build my own algorithm!
Can anyone either suggest a superior way to perform this, or tell me what I've done wrong?
If the algorithms correct, what would be the best way to withdraw all the paths between two nodes?
EDIT: I now realize that new paths don't get created from child nodes of the original, how would I make it so it does?
Here there is an implementation based on the BFS algorithm.
I will denote a path as a sequence of vertices l = (v, v', v'', ...) and I will perform the following two operations on it:
extend(l, v): puts vertex v at the end of list l;
v = back(l): gets the last vertex in list l.
FindPaths(G, v) {
// The first path is, simply, the starting node.
// It should be the first vertex in topological order.
pending_paths = {(v)};
while (pending_paths is not empty) {
l = pending_paths.remove_first(); // pop the first pending path
output(l); // output it (or save it in a list to be returned, if you prefer)
v = back(l); // Get the last vertex of the path
foreach(edge (v, v') in G) { // For each edge outgoing from v'...
// extend l with v' and put into the list of paths to be examined.
pending_paths.push_back(extend(l, v'));
}
}
}
Here's a simple recursive algorithm, expressed in pseudocode to avoid clouding the issue with lots of Java list manipulation:
AllPaths(currentNode):
result = EmptyList()
foreach child in children(node):
subpaths = AllPaths(child)
foreach subpath in subpaths:
Append(result, currentNode + subpath)
return result
Calling AllPaths on the root node will give you what you need, and you can improve the running time for nontrivial DAGs by caching the result of AllPaths on each node, so you only need to compute it once rather than once per distinct path that includes it.
So while #akappa's Pseudo was a good start, it took me awhile to understand how to make it work, if anyone else comes across this post here's how I did it:
public ArrayList<ArrayList<Node>> searchAll() {
try {
BufferedWriter out = new BufferedWriter(new FileWriter("output.txt"));
//Gets Nodes from Hashmap and puts them into Queue
for (Node key : paths.adjacencies.keySet()) {
queue.addToQueue(new QueueItem(key.chemName, new ArrayList<Node>()));
}
while (queue.getSize() > 0) {
QueueItem queueItem = queue.getFromQueue();
Node node = paths.getNodeFromGraph(queueItem.getNodeId());
if (node != null) {
findRelationAll(node, queueItem, out);
}
}
System.out.println("Cycle complete: Number of Edges: [" + resultHistoryAll.size() + "]");
out.close();
} catch (IOException e) {
}
return resultHistoryAll;
}
public void findRelationAll(Node node, QueueItem queueItem, BufferedWriter out) {
if (!foundRelation) {
StringTokenizer st = new StringTokenizer(paths.getAdjacentString(node), ",");
while (st.hasMoreTokens()) {
String child = st.nextToken();
ArrayList<Node> history = new ArrayList<Node>();
//Gets previous Nodes
history.addAll(queueItem.getHistoryPath());
//Makes sure path is unique
if (history.contains(node)) {
System.out.println("[" + counter2 + "]: Cyclic");
counter2++;
continue;
}
history.add(node);
resultHistory = history;
queue.addToQueue(new QueueItem(child, history));
if (!(inList(resultHistory, resultHistoryAll))) {
resultHistoryAll.add(resultHistory);
try {
out.write("[" + counter + "]: " + resultHistory.toString());
out.newLine();
out.newLine();
} catch (IOException e) {
}
System.out.println("[" + counter + "]: " + resultHistory.toString());
counter++;
} else {
System.out.println("[" + counter3 + "]: InList");
counter3++;
}
}
}
}
//This checks path isn't in List already
public boolean inList(ArrayList<Node> resultHistory, ArrayList<ArrayList<Node>> allPaths) {
for (int i = 0; i < allPaths.size(); i++) {
if (allPaths.get(i).equals(resultHistory)) {
return true;
}
}
return false;
}
}
The above code does a few extra things that you might not want:
It writes pathways to a text file as a list of nodes + it's counter value.
Makes sure the path doesn't cross the same node twice
Makes sure no two pathways are the same in the final list (in normal circumstances this is unnecessary work)
The QueueItem object is just a way to store the previously visited nodes. It's part of nemanja's code, which is what my code was based off.
Hat tip to him, akappa (for the most efficient answer), and jacobm (for finding a solution like my original code, and explaining it's limitations).
Incase anyone's actually interested in the work; I'm currently processing over 5 million pathways, of which 60,000 are unique pathways between 900 chemicals. And that's just 1,2,3 or 4 chemical pathways... Biology is complicated.
EDIT and Warning: IF anyone is handling huge reams of data like me, windows 7 - or at least my machine - throws a shit fit and closes the program after an ArrayList > 63,000 objects, regardless of how you arrange the pointers. The solution I started with was after 60,000 objects, restarting the list while adding everything to CSV. This led to some duplicates between list iteration, and should ultimately be surpassed by my moving to linux tomorrow!
Related
I am using Neo4j Traversal Framework(Java).
I need to create custom evalutor to include nodes where some condition is true
My Code is :
#Override
public Evaluation evaluate(Path path) {
log.info("Node Id: " + path.endNode().getProperty("DIST_ID"));
long mCount = 0;
if ((Long) path.endNode().getProperty("RANK") >= 3) {
mCount++;
}
log.info("mCount " + mCount);
return Evaluation.INCLUDE_AND_CONTINUE;
}
};
TraversalDescription traversalDescription = db.traversalDescription();
Traverser traverser = traversalDescription.breadthFirst()
.relationships(RelationshipTypes.SPONSOR, Direction.OUTGOING).evaluator(e).traverse(DSTS);
DSTS is incoming nodes. it means top node.I want to split downline nodes using rank.If example i need two levels so i want split two levels by using rank.rank is one of the property of node.if rank is 5 i want to collect this nodes and it outgoing nodes until will get rank 5.
If any possibilities please guide me...
You could look at branch state, i.e. use the evaluate method which also takes a BranchState argument. You can use that to keep state for each visited traversal branch and augment when moving down traversal branches. E.g.
new PathEvaluator<Long> {
public Evaluation evaluate(Path path, BranchState<Long> state) {
log.info("Node Id: " + path.endNode().getProperty("DIST_ID"));
long mCount = state.getState();
if ((Long) path.endNode().getProperty("RANK") >= 3) {
mCount++;
state.setState( mCount );
}
if ( mCount >= 5 ) {
// do something
}
log.info("mCount " + mCount);
return Evaluation.INCLUDE_AND_CONTINUE;
}
}
This is equivalent of summing up the RANKs from the whole path on every evaluate, but the BranchState makes this more performant.
Is this something you were thinking of?
I am absolutely confused on what to do. I'm trying to code off of the pseudo code that wikipedia has on Dijkstra's with priority queues, but I'm having a tough time making the adjustments to fit what i need to find. This is my (incomplete) code so far, and any help would be very much appreciated.
public int doDijkstras (String startVertexName, String endVertexName, ArrayList< String > shortestPath) {
PriorityQueue<QEntry> q = new PriorityQueue<QEntry>();
int cost = 0;
int newCost;
QEntry pred = null;
for (String s : this.getVertices()) {
if (!s.equals(startVertexName)) {
cost = Integer.MAX_VALUE;
pred = null;
}
q.add(new QEntry(s, cost, pred, adjacencyMap.get(s)));
}
while (!q.isEmpty()) {
QEntry curr = getMin(q);
for (String s : curr.adj.keySet()) {
newCost = curr.cost + this.getCost(curr.name, s);
QEntry v = this.getVert(q, s);
if (newCost < v.cost) {
v.cost = newCost;
v.pred = curr;
if (!q.contains(curr)) {
q.add(curr);
}
}
}
}
}
private QEntry getMin(PriorityQueue<QEntry> q) {
QEntry min = q.peek();
for (QEntry temp : q) {
if (min.cost > temp.cost) {
min = temp;
}
}
return min;
}
private QEntry getVert(PriorityQueue<QEntry> q, String s) {
for (QEntry temp : q) {
if (temp.name.equals(s)) {
return temp;
}
}
return null;
}
class QEntry {
String name;
int cost;
QEntry pred;
TreeMap<String, Integer> adj;
public QEntry(String name, int cost, QEntry pred, TreeMap<String, Integer> adj) {
this.name = name;
this.cost = cost;
this.adj = adj;
this.pred = pred;
}
}
You are overlooking an important part of the algorithm: when to stop.
The pseudocode on Wikipedia is for the variation on Dijkstra's algorithm that computes the shortest path from the start node to every node connected to it. Commentary immediately following the big pseudocode block explains how to modify the algorithm to find only the path to a specific target, and after that is a shorter block explaining how to extract paths.
In English, though, as you're processing your priority queue, you need to watch for the target element being the one selected. When (if ever) it is, you know that no shorter path to it can be discovered than the one having the cost recorded in the target's queue entry, and represented (in reverse order) by that entry and its chain of predecessors. You fill the path list by walking the chain of predecessors, and you return the value that was recorded in the target queue entry.
Note, however, that in your code, in the event that the start and target vertexes are not connected in the graph (including if the target is not in the graph at all), you will eventually drain the queue and fall out the bottom of the while loop without ever reaching the target. You have to decide what to do with the path list and what to return in that case.
Note, too, that your code appears to have several errors, among them:
In the event that the start vertex name is not the first one in the iteration order of this.getVertices(), its queue entry will not be initialized with cost 0, and will not likely be the first element chosen from the queue.
If the specified start vertex is not in the graph at all then your code will run, and may emit a path, but its output in that case is bogus.
Your queue elements (type QEntry) do not have a natural order; to create a PriorityQueue whose elements have such a type, you must provide a Comparator that defines their relative priorities.
You are using your priority queue as a plain list. That in itself will not make your code produce wrong results, but it does increase its asymptotic complexity.
Be aware, however, that if you use the standard PriorityQueue as a priority queue, then you must never modify an enqueued object in a way that could change its order relative to any other enqueued object; instead, remove it from the queue first, modify it, then enqueue it again.
I have this algorithm and I want to implement a graph search, using recursive backtracking.
First of all my code:
public static boolean buildTree(GenericTreeNode<String> inputNode){
while(!interruptFlag)
{
try { Thread.sleep(200); } catch(InterruptedException e) {}
gui.frame.MainWindow.progress.setText("Iterations Deployment: " + c);
gui.panel.ResultMatrix.setResult(mappingList);
Multimap<String,String> openList = LinkedHashMultimap.create();
openList = UtilityClasses.getOpenList.getOpenList(dataMap, ApplicationList, HardwareList, mappingList);
if(openList.isEmpty() && !mappingList.keySet().containsAll(XMLParser.ApplicationsListGUI))
{
gui.frame.MainWindow.labelSuccess.setText("Mapping not succesful!");
return false;
}
if(openList.isEmpty() && mappingList.keySet().containsAll(XMLParser.ApplicationsListGUI))
{
System.out.println(calculateOverallCost.getOverallCosts());
System.out.println("Mapping done:" + " " + mappingList);
gui.panel.ResultMatrix.setResult(mappingList);
return true;
}
if(!openList.isEmpty() && (!mappingList.keySet().containsAll(XMLParser.ApplicationsListGUI)))
{
for(String s : openList.keySet())
{
for(String h : openList.get(s))
{
GenericTreeNode<String> child = new GenericTreeNode<String>(s + ":" + h);
inputNode.addChild(child);
child.setCosts(UtilityClasses.CostFunction.calculateCostFunction(s, h));
}
}
List<GenericTreeNode<String>> childlist = inputNode.getChildren();
Collections.sort(childlist);
for(int i = 0; i < childlist.size() ; i++)
{
inputNode = childlist.get(i);
// do something
if (buildTree(inputNode))
{
return true;
}
else
{
// undo something
}
}
Thats the code I have so far. It builds the tree in everystep. Every node in the tree is a possible solution, ordered by a heuristic costfunction. The first 2 if-clauses are the conditions to terminate and return. If there is a solution, it finds it pretty smoothly. But if there is no quick solution, I need to undo the last step and try some other combinations. In the worst case, every combination should be tested.
The childlist holds every child nodes, ordered by their costfunction. The one with the least costfunction, will be chosen for expansion. Building the tree is done recursively, but I have problems with the backtracking. I dont get the search to go back a step and try the second best node and so on. The graph is expanded every step with the new calculated openList. I saved a reference to the parent node, if that could be a help.
The openlist is a list, which holds every possible next step -> nodes.
Maybe this picture will help explaining my problem better:
thats more or less the search I wanted to realize. But the code i have till now, stucks at the end of a leave, no matter if a solution is found or not. I tried many different things, but this backtracking dont seem to work, for my kind of problem or at least I cant get it going.
If I understood correctly, this needs a pre-order tree vist.
I ommited some details, but I think this code will help you (I haven't test it):
public static boolean buildTree(GenericTreeNode<String> inputNode) {
if (interruptFlag) {
// search was interrupted
// answer has not be found yet
return false;
}
boolean something = openList.isEmpty() && !mappingList.keySet().containsAll(XMLParser.ApplicationsListGUI);
if (something) {
// ... Mapping not succesful!
// answer can't be found
return false;
}
boolean answerFound = openList.isEmpty() && (mappingList.keySet().containsAll(XMLParser.ApplicationsListGUI));
if (answerFound) {
// ...
return true;
}
// answer has not been found
// visit each children
// order children list by cost
// ...
List<GenericTreeNode<String>> childlist = // ...
Collections.sort(childlist);
for (int i = 0; i < childlist.size(); i++) {
inputNode = childlist.get(i);
// do something
boolean childHasAnswer = buildTree(inputNode);
if (childHasAnswer) {
// answer was found
return true;
} // else: our children do not have the answer
}
// neither we or our children have the answer, let's go to the parent
return false;
}
I mainly deleted the first while, and deleted the last else.
I'm trying to implement A-Star in Java based on OSM Data. My problem is that my implementation is not working correctly. First of all the path is not the shortest. Second the closedlist contains more 1/3 more nodes in the end as Dijkstra. Thats actuall not that what I expected.
Here is my A-Star code which is based on Wikipedia Pseudocode
public Object[] executeAstar(ArrayList<Arclistentry> data, NodeD start, NodeD dest,long[] nodenur)
{
openlist = new PriorityQueue<NodeD>(1,comp);
closedlist.clear();
openlist.offer(start);
start.setg(0);
start.seth(calccost(start, dest));
start.setf(start.getg()+start.geth());
while(!openlist.isEmpty())
{
NodeD currentnode = openlist.poll();
if(currentnode.getnodenumber() == dest.getpredessor())
{
closedlist.add(currentnode);
return drawway(closedlist, start, dest);
}
closedlist.add(currentnode);
ArrayList<Arclistentry> entries = neighbors.get((int)currentnode.getnodenumber()-1);
for(Arclistentry aentry:entries)
{
NodeD successor = new NodeD(aentry.getnode(),aentry.getstart(), aentry.getcoorddest());
float tentative_g = currentnode.getg()+calccost(currentnode,successor);//+aentry.getcost();
if(contains(successor, closedlist))
{
continue;
}
if((contains(successor,openlist))&& tentative_g >= aentry.getcost())
{
continue;
}
if(!contains(successor, openlist))
{
successor.setpredessor(currentnode.getnodenumber());
successor.setg(tentative_g);
successor.seth(calccost(successor, dest));
successor.setf(successor.getg()+successor.geth());
openlist.offer(successor);
}
else
{
openlist.remove(successor);
successor.setpredessor(currentnode.getnodenumber());
successor.setg(tentative_g);
successor.seth(calccost(successor, dest));
successor.setf(successor.getg()+successor.geth());
openlist.offer(successor);
}
}
}
return drawway(closedlist,start, dest);
}
My Heuristics will be calculated by using the euclidian distance. But to consider also the cost of the node, the costs are multiplied with the heuristics result. My Data structure contains the following:
private long nodenumber;
private long predessor;
private float label;
private float f;
private float g;
private float h;
private double[] coord = new double[2];
public NodeD(long nodenr, long predessor, double[] coor)
{
this.nodenumber = nodenr;
this.predessor = predessor;
this.coord = coor;
}
public NodeD(long nodenr, long predessor, float label)
{
this.nodenumber = nodenr;
this.predessor = predessor;
this.label = label;
}
and for the arclist I use the following:
private long start;
private long dest_node;
private float cost_;
private double[]coordstart = new double[2];
private double[]coorddest = new double[2];
Contains Function for Priority Queue:
public boolean contains(NodeD o, PriorityQueue<NodeD> al)
{
Iterator<NodeD> e = al.iterator();
if (o==null)
{
while (e.hasNext())
{
if (e.next()==null)
{
return true;
}
}
}
else
{
while (e.hasNext())
{
NodeD t = e.next();
if(t.equals(null))
{
return false;
}
if (((o.getnodenumber()==t.getnodenumber()) & (o.getpredessor()==t.getpredessor()))||(o.getnodenumber()==t.getpredessor() & o.getpredessor()==t.getnodenumber()))
{
return true;
}
}
return false;
}
return false;
}
and contains for ArrayList (because it was not detecting right with the ArrayList.contains function
public boolean contains(NodeD o, ArrayList<NodeD> al) {
return indexOf(o,al) >= 0;
}
public int indexOf(NodeD o, ArrayList<NodeD> al) {
if (o == null) {
for (int i = 0; i < al.size(); i++)
if (al.get(i)==null)
return i;
} else {
for (int i = 0; i < al.size(); i++)
{
if ((o.getpredessor()==al.get(i).getpredessor())) //(o.getnodenumber()==al.get(i).getnodenumber()) &&
{
return i;
}
else if((o.getpredessor()==al.get(i).getnodenumber())&&(o.getnodenumber()==al.get(i).getpredessor()))
{
return i;
}
}
}
return -1;
}
The problem is that the algorithm is visiting all nodes. The other problem is the sorted openlist, which is pushing neighbors of the currentnode up, because they have a lower f value. So what I'm duing wrong by implementing this algorithm?
Recap of all our previous answers:
Make sure the A* estimation is a lower estimate otherwise it will wrongly skip parts
Do not iterate over all nodes to determine the index of the edges of your current node's edge set in an array
When creating new objects to put in your queue/sets, checks should be done on the properties of the nodes
If your focus is on speed, avoid as much work as possible by aborting non-interesting searches as soon as possible
I'm still unsure about this line:
if((contains(successor,openlist))&& tentative_g >= aentry.getcost())
What I think you are trying to do is to avoid adding a new node to the queue when you already have a better value for it in there. However, tentative_g is the length of the path from your starting node to your current node while aentry.getcost seems to be the length of the edge you are relaxing. That doesn't seem right to me... Try to retrieve the correct (old) value to compare against your new tentative label.
Lastly, for your current code, I would also make the following changes:
Use HashSet for your closedlist. Every time you check if a node is in there, you have to go over them all, which is not that efficient... Try using a HashSet by overriding the hash function of your NodeD objects. The built-in contains-function is much faster than your current approach. A similar argument can be made for your openlist. You cannot change the PQ to a set but you could omit the contains-checks. If you add a node with a bad priority, you will always first poll the correct priority (because it PQ) and could then, when polling the bad priority, just skip it. That's a small optimisation that trades off size of PQ to PQ lookup-operations
avoid recalculating stuff (mainly calccost()) by calculating it once and reusing the value when you need it (small time gain but nicer code).
try to avoid multiple lines with the same code by placing them on the correct line (e.g. 2 closedlist.add function can be merged to 1 add-call placed above the if condition, if you have something like if(..){doA();doB()}else{doA();doC();} try to put doA() before the if for legibility)
I'm writing a function which generates all paths in a tree as xpath statements and storing them in a bag below is a naive (sorry this is long) and below that is my attempt to optimize it:
/**
* Create the structural fingerprint of a tree. Defined as the multiset of
* all paths and their multiplicities
*/
protected Multiset<String> createSF(AbstractTree<String> t,
List<AbstractTree<String>> allSiblings) {
/*
* difference between unordered and ordered trees is that the
* next-sibling axis must also be used
*
* this means that each node's children are liable to be generated more
* than once and so are memo-ised and reused
*/
Multiset<String> res = new Multiset<String>();
// so, we return a set containing:
// 1. the node name itself, prepended by root symbol
res.add("/" + t.getNodeName());
List<AbstractTree<String>> children = t.getChildren();
// all of the childrens' sets prepended by this one
if (children != null) {
for (AbstractTree<String> child : children) {
Multiset<String> sub = createSF(child, children);
for (String nextOne : sub) {
if (nextOne.indexOf("//") == 0) {
res.add(nextOne);
} else {
res.add("/" + nextOne);
res.add("/" + t.getNodeName() + nextOne);
}
}
}
}
// 2. all of the following siblings' sets, prepended by this one
if (allSiblings != null) {
// node is neither original root nor leaf
// first, find current node
int currentNodePos = 0;
int ptrPos = 0;
for (AbstractTree<String> node : allSiblings) {
if (node == t) {
currentNodePos = ptrPos;
}
ptrPos++;
}
// 3. then add all paths deriving from (all) following siblings
for (int i = currentNodePos + 1; i < allSiblings.size(); i++) {
AbstractTree<String> sibling = allSiblings.get(i);
Multiset<String> sub = createSF(sibling, allSiblings);
for (String nextOne : sub) {
if (nextOne.indexOf("//") == 0) {
res.add(nextOne);
} else {
res.add("/" + nextOne);
res.add("/" + t.getNodeName() + nextOne);
}
}
}
}
return res;
}
And now the optimization which is (currently) in a subclass:
private Map<AbstractTree<String>, Multiset<String>> lookupTable = new HashMap<AbstractTree<String>, Multiset<String>>();
public Multiset<String> createSF(AbstractTree<String> t,
List<AbstractTree<String>> allSiblings) {
Multiset<String> lookup = lookupTable.get(t);
if (lookup != null) {
return lookup;
} else {
Multiset<String> res = super.createSF(t, allSiblings);
lookupTable.put(t, res);
return res;
}
}
My trouble is that the optimized version runs out of heap space (the vm args are set at -Xms2g -Xmx2g) and is very slow on moderately large input. Can anyone see a way to improve on this?
Run the code through a profiler. That's the only way to get real facts about the code. Everything else is just guesswork.
"generates all paths in a tree as xpath statements"
How many paths are you creating? This can be non-trivial. The number of paths should be O( n log n ), but the algorithm could be much worse depending on what representation they use for children of a parent.
You should profile the simple enumeration of paths without worrying about the bag storage.
Your code eats RAM exponentially. So one layer more means children.size() times more RAM.
Try to use a generator instead of materializing the results: Implement a Multiset which does not calculate the results beforehand but iterates through the tree structure as you call next() on the set's iterator.