extra space for recursive depth-first search to store paths - java

I am using depth-first search to identify paths in a directed weighted graph, while revisiting nodes that belong to a cycle, and setting cutoff conditions based on total distance traveled, or stops from the source node.
As I understand, with recursion an explicit stack structure is not required for depth first search, so I was wondering if I could further simplify my code below by somehow doing without the explicit stack:
public class DFSonWeightedDirectedGraph {
private static final String START = "A";
private static final String END = "E";
private int pathLength = 0;
private int stops = 0;
public static void main(String[] args) {
//this is a directed weighted graph
WeightedDirectedGraph graph = new WeightedDirectedGraph();
graph.addEdge("A", "B", 15);
graph.addEdge("A", "D", 15);
graph.addEdge("A", "E", 27);
//(...) more edges added
Stack<String> visited = new Stack<String>();
visited.push(START);
new DFSonWeightedDirectedGraph().depthFirst(graph, visited);
}
private void depthFirst(WeightedDirectedGraph graph, Stack<String> visited) {
Collection<Map.Entry<String, Integer>> tree_of_children
= graph.get_tree_of_children(visited.peek());
for (Map.Entry<String, Integer> child : tree_of_children) {
if(pathLength + child.getValue()>= 20){
continue;
}
visited.push(child.getKey());
pathLength += child.getValue();
stops += 1;
if (child.getKey().equals(END)) {
printPath(visited);
}
depthFirst(graph, visited);
visited.pop();
pathLength -= child.getValue();
stops -= 1;
}
}
private void printPath(Stack<String> visited) {
for (String node : visited) {
System.out.print(node);
System.out.print(" ");
}
System.out.println("[path length: "+pathLength +
" stops made: " + stops +"]");
}
}
However, other recursive implementations without an explicit stack structure usually take into account already visited nodes, by coloring them white, gray or black. So, in my case where revisiting is allowed, and the path needs to be recorded, is an explicit stack absolutely required? Thanks for any suggestions of simpler alternatives.

If you have to save the path, you need a data structure for this. Your stack is OK; you could replace it with another data structure, but not get rid of it.
If it would be OK to directly print the path (and not record it), you do not need a stack. Then you can change the method signature to just get the graph and the actual node (and perhaps the actual path length and the "stops").

Just add an extra field to the node structure, that is a "visited" field. This will be fastest. You do have to unmark all nodes afterwards (or before you do the search).
Or, just hash the id of the node in a hashtable. This will be faster to check than a stack. If you don't have an id for the node, it is a good idea to create one, to help with debugging, output, etc.
You do need extra space, but adding a boolean field to each node will require the least space, since it will be 1 bit per node, vs. 1 pointer per node for a stack.
You don't really need a distance cut-off, since you are searching a finite graph and you only visit each node once, so you will visit at most N nodes in an N-node graph. You would need a depth cutoff if you were searching an infinite space, such as when doing a state-space search (an example is a prolog interpreter searching for a proof).

You don't need the visited nodes. Just pass your current child node to the recursive method instead of the visited nodes parameter, and use the return value for carrying the path.
If you can process the path element by element, i.e. rewrite printPath() so that it can be called once per element just the key type is required as return type. If you want to receive the whole path you need a list of key values as return type.
Actually, you're relatively close to the solution. Just use the call stack of the recursive method calls to represent the path.

Edit: This answer is completely off topic and was posted based on misinterpreting the question.
There are several things wrong with your DFS implementation. Yes, it visits all nodes in a depth-first manner and it does eventually manage to find a path between START and END, but it does not attempt to check for already visited nodes and keeps a stack for no real reason. The only reason you don't fall into infinite recursion on cycles is because you limit the maximum path length, and you will still take a long time on graphs that have multiple distinct paths between all pairs of vertices.
The only thing you are using the stack for is to pass the node to be visited next to the dfs function. You can simply get rid of the stack and pass the node directly.
So, instead of
private void depthFirst(WeightedDirectedGraph graph, Stack<String> visited) {
...
visited.push(child);
...
depthFirst(graph, visited);
You can simply write this as
private void depthFirst(WeightedDirectedGraph graph, String node) {
...
//visited.push(child); <-- No longer needed
...
depthFirst(graph, child);
You are using a data structure (stack) that you have named 'visited' and yet you do not use that to store/mark which nodes have been already visited to avoid revisiting.
You can modify your existing code to have a Set called visited (make it a global/class variable or pass it along recursive calls as you did with your stack) where you keep all nodes already visited and only call depthFirst() on those nodes that are not already in that Set.
That should make your code look something like this
private void depthFirst(WeightedDirectedGraph graph, String node, Set<String> visited) {
visited.add(node); // mark current node as visited
...
//visited.push(child); <-- No longer needed
...
if (!visited.contains(child)){ // don't visit nodes we have worked on already
depthFirst(graph, child);
}
So far my answer has been to try to modify your code to make it work. But it appears to me that you need to get a better grasp of what a DFS actually is and how it really works. Reading up the relevant chapter on any good Algorithm/Graph Theory book would help you greatly. I would recommend CLRS (it has a very nice chapter on simple graph traversals), but any good book should do. A simple and correct recursive DFS can be implemented in a much simpler manner using arrays, without having to resort to stacks or sets.
Edit:
I did not mention how you could retrieve the path after replacing the stack. This can be easily done by using a Map that stores the parent of each node as it is explored. The path (if any is found) can be obtained using a recursive printPath(String node) function, that prints the node passed to it and calls itself again on its parent.

Related

Graphs: DFS visit and Java implementation

I'm trying to implement a recursive Depth-First Search for a graph in Java language.
Assumptions
The graph is represented with adjacency lists.
GraphSearchImpl is a data structure that stores the result of a visit.
GraphSearchImpl contains arrays that store, for each vertex, start/end times, visit status (UNDISCOVERED, DISCOVERED, CLOSED), weight of the path etc etc..
All vertices have a unique index mapped in a HashMap, where the String is a unique label for each vertex. I'm reading/writing arrays cells, for a specified vertex, using this index.
Code
public void visitDFSRecursive(GraphSearchImpl search,VertexImpl u,Integer time) {
int index = search.getIndexOf(u);
search.status[index]=VertexImpl.DISCOVERED;
time++;
search.startTimes[index]=time;
for (VertexImpl v: u.getAdjList()) {
int adjIndex = search.getIndexOf(v);
if (search.status[adjIndex]==VertexImpl.UNDISCOVERED) {
search.status[adjIndex]=VertexImpl.DISCOVERED;
search.parents[adjIndex]=u;
visitDFSRecursive(search,v,time);
}
}
search.status[index]=VertexImpl.CLOSED;
time++;
search.endTimes[index]=time;
}
I'm calling this method like this, over a graph with just two nodes (A -> B):
g.visitDFSRecursive(search,sourceVertex,new Integer(0));
The output is the following:
-A starts at 1 ends at 2
-B starts at 2 ends at 3
It's obviously wrong, because the time interval of start/end of B must be included in the A's one, since B is a son of A in this graph.
I know the problem is that I'm not using time counter right.
Please suggest.
The problem is that time is a local variable, so when you increment it up in your recursion, the time for A is not affected. You should convert it to a global/static variable, or make an integer wrapper class and pass it through a mutable object.

Seemingly correct BFS implementation finding paths, but not *shortest* paths (weightless edges)

Below is an attempt at an algorithm to find shortest paths in a graph with weightless edges, with one added constrain: a set of nodes that cannot be in the path. So instead of finding the absolute shortest path between nodes, it finds the shortest path that doesn't include certain nodes.
Wordnode is the node class, and HashSet avoids is the set of nodes that must be avoided. The only place in the algorithm where this comes into play is when checking whether to add a node to the queue. If it's in avoids (or if it's already been visited), don't add it. I believe the effect of this check should be equivalent to temporarily removing any edges into and out of nodes in avoids, though by using the HashSet I avoid actually mutating the data structure.
I thought the algorithm was working until I managed to get shorter paths by adding words to avoids. e.g., if avoids is empty, then for shortestPath(A, Z, {}) it might return (A, B, E, C, F, L, D, Z), but upon adding E and C to avoids and calling shortestPath(A, Z, {E, C}), I get (A, R, K, Z), which is shorter...
The graph I'm using has thousands of nodes, but I have checked that both (A, B, E, C, F, L, D, Z) and (A, R, K, Z) are valid paths. The problem is that the algorithm is returning a path of length 8 when avoids is empty, when there are demonstrably existent paths of length only 4.
This suggests to me that either my algorithm (below) is incorrect, or there are problems with my graph data structure. It will be more difficult to check the latter, so I figured I would see if anyone spots a problem below first.
So, can you see any reason the algorithm below would find shorter paths when avoids is non-empty than when it's empty?
Note: "this" is the origin, and the destination ("dest") is an argument.
Thanks
public LinkedList<String> shortestPath(Wordnode dest, int limit, HashSet<Wordnode> avoids)
{
HashSet<Wordnode> visited = new HashSet<>();
HashMap<Wordnode, Wordnode> previous = new HashMap<>();
LinkedList<Wordnode> q = new LinkedList<Wordnode>();
previous.put(this, null);
q.add(this);
Wordnode curr = null;
boolean found = false;
while(!q.isEmpty() && !found)
{
curr = q.removeLast();
visited.add(curr);
if(curr == dest)
found = true;
else
{
for(Wordnode n: curr.neighbors)
{
if(!visited.contains(n) && !avoids.contains(n))
{
q.addFirst(n);
previous.put(n, curr);
}
}
}
}
if(!found)
return null;
LinkedList<String> ret = new LinkedList<>();
while(curr != null)
{
ret.addFirst(curr.word);
curr = previous.get(curr);
}
return ret;
}
I think your problem is how you build the edge list using the previous map. You store the last seen edge when queuing nodes, but this edge may not be lie on the shortest path.
You check for dest when you pull it from the queue, but the edge stored in previous for the dest node may no longer be the edge that was followed to get to dest when it was added to the queue.
When you provide avoids nodes you skip the process of updating the edges in previous so you may end up with a shorter path - it is not whether avoids is specified or not, but rather whether avoids contains nodes on the longer path that may 'corrupt' the edge list.
Your BFS is correct. The problem is how you write the found path. The shortest path in BFS means "the number of levels away from a source to a destination". But you are counting the number of unique nodes that was looked on your way from the source to destination.
Consider the graph of 3 nodes each connected to each other:
B
/
A |
\
C
The path A-C is 1 level long. Your implementation can give the path length of 2, because nodes can visited as A-B and then C. The order will depend on your input data.
So you need to count levels.
I'm adding an answer here because none of the previous indicates the correct error.
The problem is where the nodes are marked as visited: in the original it is done when the node is popped from the queue, which means that until a given node reach the top of the queue it may be added several time and thus altering the path construction.
You must mark your node when enqueuing them so, after the line queue.addFirst(n) just add visited.add(n).

Graph traversal - finding and returning the shortest distance

I am using a Breadth first search in a program that is trying to find and return the shortest path between two nodes on an unweighted digraph.
My program works like the wikipedia page psuedo code
The algorithm uses a queue data structure to store intermediate results as it traverses the graph, as follows:
Enqueue the root node
Dequeue a node and examine it
If the element sought is found in this node, quit the search and return a result.
Otherwise enqueue any successors (the direct child nodes) that have not yet been discovered.
If the queue is empty, every node on the graph has been examined – quit the search and return "not found".
If the queue is not empty, repeat from Step 2.
So I have been thinking of how to track number of steps made but I am having trouble with the limitations of java (I am not very knowledgeable of how java works). I originally was thinking that I could create some queue made up of a data type I made that stores steps and nodes, and as it traverses the graph it keeps track of the steps. If ever the goal is reached just simply return the steps.
I don't know how to make this work in java so I had to get rid of that idea and I moved on to using that wonky Queue = new LinkedList implementation of a queue. So basically I think it is a normal integer queue, I couldn't get my data type I made to work with it.
So now I have to find a more basic approach so I tried to use a simple counter, this doesn't work because the traversal algorithm searches down many paths before reaching the shortest one so I had an idea. I added a second queue that tracked steps, and I added a couple counters. Any time a node is added to the first queue I add to the counter, meaning I know that I am inspecting new nodes so I am not a distance further away. Once all those have been inspected I can then increase the step counter and any time a node is added to the first queue I add the step value to the step queue. The step queue is managed just like the node queue so that when the goal node is found the corresponding step should be the one to be dequeued out.
This doesn't work though and I was having a lot of problems with it, I am actually not sure why.
I deleted most of my code in panic and frustration but I will start to try and recreate it and post it here if anyone needs me to.
Were any of my ideas close and how can I make them work? I am sure there is a standard and simple way of doing this as well that I am not clever enough to see.
Code would help. What data structure are you using to store the partial or candidate solutions? You say your using a queue to store nodes to be examined, but really the objects stored in the queue should wrap some structure (e.g. List) that indicates the nodes traversed to get to the node to be examined. So, instead of simple Nodes being stored in the queue, some more complex object would be needed to make available the information necessary to know the complete path taken to that point. A simple node would only have information about itself, and it's children. But if you're examining node X, you also need to know how you arrived to node X. Just knowing node X isn't enough, and the only way (I know of) to know the path taken to node X is to store the path in the object that represents a "partial solution" or "candidate solution". If this is done, then finding the length of the path is trivial, because it's just the length of this list (or whichever data structure chosen). Hope I'm making some sense here. If not, post code and I'll take a look.
EDIT
These bits of code help show what I mean (they're by no means complete):
public class Solution {
List<Node> path;
}
Queue<Solution> q;
NOT
Queue<Node> q;
EDIT 2
If all you need is the length of the path, and not the path, per se, then try something like this:
public class Solution {
Node node; // whatever represents a node in you algorithm.
int len; // the length of the path to this node.
}
// Your queue:
LinkedList<Solution> q;
With this, before enqueuing a candidate solution (node), you do something like:
Solution sol = new Solution();
sol.node = childNodeToEnqueue;
sol.len = parentNode.len + 1;
q.add(sol);
The easiest solution in order to track distance during a traversal is to add a simple array (or a map if you vertices are not indexed by integers).
Here is pseudo code algorithm:
shortest_path(g, src, dst):
q = new empty queue
distances = int array of length order of g
for i = 0 to order: distances[i] = -1
distances[src] = 0
enqueue src in q
while q is not empty:
cur = pop next element in q
if cur is dst: return distances[dst]
foreach s in successors of cur in g:
if distances[s] == -1:
distances[s] = distances[cur] + 1
enqueue s in q
return not found
Note: order of a graph is the number of vertices
You don't need special data structures, the queue can just contains vertices' id (probably integers). In Java, LinkedList class implements the Queue interface, so it's a good candidate for your queue. For the distances array, if your vertices are identified by integers an integer array is enough, otherwise you need a kind of map.
You can also separate the vertex tainting (the -1 in my algo) using a separate boolean array or a set, but it's not really necessary and will waste some space.
If you want the path, you can also do that with a simple parent array: for each vertex you store its parent in the traversal, just add parent[s] = cur when you enqueue the successor. Then retrieving the path (in reverse order) is a simple like this:
path = new empty stack
cur = dst
while cur != src:
push cur in path
cur = parent[cur]
push src in path
And there you are …

How to Transver back up a tree from a sibling Node

I have a transverse tree written in JSP which goes through an XML file.
When I get to a certain Text Node, I'd like to be able to search back up the tree to find a certain element associated with that node.
I'm thinking I need to do a For loop and use some kind of 'getLastNode' or 'getParentNode' function. Would this be the correct method? I'm a little unsure of the syntax, so any help would be much appreciated!
I did a bit of search and I can't find anything which demonstrates what I'm trying to do nor can I find a list of the functions I'm after.
You need to keep calling getParentNode until you hit a node matching your criteria. For example:
public Node searchUpFor(String tagToFind, Node aNode) {
Node n = aNode.getParentNode();
while (n != null && !n.getNodeName().equals(tagToFind)) {
n = n.getParentNode();
}
return n;
}

Debugging BFS tree travesal algorithm

I'm working alone on this project and could use another set of eyes to look at this to see what I am doing wrong. The first loop runs infinitely.
public void bfs(String start)
{
//Initial Case
add_queue.add(start);
graph.visit(start);
Iterator<String> neighbors;
String neighbor;
while(!add_queue.empty())
{
neighbors = graph.neighbors(start);
neighbor = neighbors.next();
graph.visit(neighbor);
add_queue.add(neighbor);
while(neighbors.hasNext())
{
neighbor = neighbors.next();
if(!graph.isVisited(neighbor)) //If vertex is not visited it is new and is added to the queue
{
add_queue.add(neighbor);
graph.visit(neighbor);
}
}
start = add_queue.remove();
remove_queue.add(start); //transfers vertex from add_queue to remove queue so that the order that the vertices were traversed is stored in memory
}
}
I think you are adding the first vertex of neighbours without checking if it's already visited.. here:
neighbor = neighbors.next(); <- you get first
graph.visit(neighbor); <- you visit
add_queue.add(neighbor); <- you add it without any check
while(neighbors.hasNext())
{
neighbor = neighbors.next();
if(!graph.isVisited(neighbor)) <- you do check for the others
{
add_queue.add(neighbor);
graph.visit(neighbor);
}
}
This means that you will never empty that queue.. since it starts with a size of 1, then you remove 1 element on each iteration but you add at least 1 element (you never add noone).
What's add_queue's definition of empty()?
It could be a bad naming issue, but it sounds like empty() does something, not just checks whether it is empty (which would be probably called isEmpty()).
Also, it looks like you always add at least 1 to add_queue in each outer loop (right before the inner while), but only remove one item from add_queue per iteration.
A few places to investigate:
Check to make sure that graph.isVisited() is actually recognizing when a node has been visited via graph.visit().
Is graph.neighbor(start) truly returning start's neighbors? And not including start in this list?
Your code is a little unclear. What exactly does graph.neighbors return?
In general to do a BFS you want to add the children of the current node to the queue, not the neighbors of it. Since it's all going into a queue this will ensure that you visit each node in the tree in the correct order. Assuming that it's a tree and not a general graph, this will also ensure that you don't visit a node more than once, allowing you to remove the checks to isVisited.
So, get the next node out of the queue, add all of it's children to the queue, visit the node, and repeat, until the queue is empty.

Categories