Neo4j Traversal Description - Custom Evaluator - java

I am using Neo4j Traversal Framework(Java).
I need to create custom evalutor to include nodes where some condition is true
My Code is :
#Override
public Evaluation evaluate(Path path) {
log.info("Node Id: " + path.endNode().getProperty("DIST_ID"));
long mCount = 0;
if ((Long) path.endNode().getProperty("RANK") >= 3) {
mCount++;
}
log.info("mCount " + mCount);
return Evaluation.INCLUDE_AND_CONTINUE;
}
};
TraversalDescription traversalDescription = db.traversalDescription();
Traverser traverser = traversalDescription.breadthFirst()
.relationships(RelationshipTypes.SPONSOR, Direction.OUTGOING).evaluator(e).traverse(DSTS);
DSTS is incoming nodes. it means top node.I want to split downline nodes using rank.If example i need two levels so i want split two levels by using rank.rank is one of the property of node.if rank is 5 i want to collect this nodes and it outgoing nodes until will get rank 5.
If any possibilities please guide me...

You could look at branch state, i.e. use the evaluate method which also takes a BranchState argument. You can use that to keep state for each visited traversal branch and augment when moving down traversal branches. E.g.
new PathEvaluator<Long> {
public Evaluation evaluate(Path path, BranchState<Long> state) {
log.info("Node Id: " + path.endNode().getProperty("DIST_ID"));
long mCount = state.getState();
if ((Long) path.endNode().getProperty("RANK") >= 3) {
mCount++;
state.setState( mCount );
}
if ( mCount >= 5 ) {
// do something
}
log.info("mCount " + mCount);
return Evaluation.INCLUDE_AND_CONTINUE;
}
}
This is equivalent of summing up the RANKs from the whole path on every evaluate, but the BranchState makes this more performant.
Is this something you were thinking of?

Related

Get all possible links between two strings [duplicate]

I am working on an implementation of Dijkstra's Algorithm to retrieve the shortest path between interconnected nodes on a network of routes. I have the implementation working. It returns all the shortest paths to all the nodes when I pass the start node into the algorithm.
My question:
How does one go about retrieving all possible paths from Node A to, say, Node G or even all possible paths from Node A and back to Node A?
Finding all possible paths is a hard problem, since there are exponential number of simple paths. Even finding the kth shortest path [or longest path] are NP-Hard.
One possible solution to find all paths [or all paths up to a certain length] from s to t is BFS, without keeping a visited set, or for the weighted version - you might want to use uniform cost search
Note that also in every graph which has cycles [it is not a DAG] there might be infinite number of paths between s to t.
I've implemented a version where it basically finds all possible paths from one node to the other, but it doesn't count any possible 'cycles' (the graph I'm using is cyclical). So basically, no one node will appear twice within the same path. And if the graph were acyclical, then I suppose you could say it seems to find all the possible paths between the two nodes. It seems to be working just fine, and for my graph size of ~150, it runs almost instantly on my machine, though I'm sure the running time must be something like exponential and so it'll start to get slow quickly as the graph gets bigger.
Here is some Java code that demonstrates what I'd implemented. I'm sure there must be more efficient or elegant ways to do it as well.
Stack connectionPath = new Stack();
List<Stack> connectionPaths = new ArrayList<>();
// Push to connectionsPath the object that would be passed as the parameter 'node' into the method below
void findAllPaths(Object node, Object targetNode) {
for (Object nextNode : nextNodes(node)) {
if (nextNode.equals(targetNode)) {
Stack temp = new Stack();
for (Object node1 : connectionPath)
temp.add(node1);
connectionPaths.add(temp);
} else if (!connectionPath.contains(nextNode)) {
connectionPath.push(nextNode);
findAllPaths(nextNode, targetNode);
connectionPath.pop();
}
}
}
I'm gonna give you a (somewhat small) version (although comprehensible, I think) of a scientific proof that you cannot do this under a feasible amount of time.
What I'm gonna prove is that the time complexity to enumerate all simple paths between two selected and distinct nodes (say, s and t) in an arbitrary graph G is not polynomial. Notice that, as we only care about the amount of paths between these nodes, the edge costs are unimportant.
Sure that, if the graph has some well selected properties, this can be easy. I'm considering the general case though.
Suppose that we have a polynomial algorithm that lists all simple paths between s and t.
If G is connected, the list is nonempty. If G is not and s and t are in different components, it's really easy to list all paths between them, because there are none! If they are in the same component, we can pretend that the whole graph consists only of that component. So let's assume G is indeed connected.
The number of listed paths must then be polynomial, otherwise the algorithm couldn't return me them all. If it enumerates all of them, it must give me the longest one, so it is in there. Having the list of paths, a simple procedure may be applied to point me which is this longest path.
We can show (although I can't think of a cohesive way to say it) that this longest path has to traverse all vertices of G. Thus, we have just found a Hamiltonian Path with a polynomial procedure! But this is a well known NP-hard problem.
We can then conclude that this polynomial algorithm we thought we had is very unlikely to exist, unless P = NP.
The following functions (modified BFS with a recursive path-finding function between two nodes) will do the job for an acyclic graph:
from collections import defaultdict
# modified BFS
def find_all_parents(G, s):
Q = [s]
parents = defaultdict(set)
while len(Q) != 0:
v = Q[0]
Q.pop(0)
for w in G.get(v, []):
parents[w].add(v)
Q.append(w)
return parents
# recursive path-finding function (assumes that there exists a path in G from a to b)
def find_all_paths(parents, a, b):
return [a] if a == b else [y + b for x in list(parents[b]) for y in find_all_paths(parents, a, x)]
For example, with the following graph (DAG) G given by
G = {'A':['B','C'], 'B':['D'], 'C':['D', 'F'], 'D':['E', 'F'], 'E':['F']}
if we want to find all paths between the nodes 'A' and 'F' (using the above-defined functions as find_all_paths(find_all_parents(G, 'A'), 'A', 'F')), it will return the following paths:
Here is an algorithm finding and printing all paths from s to t using modification of DFS. Also dynamic programming can be used to find the count of all possible paths. The pseudo code will look like this:
AllPaths(G(V,E),s,t)
C[1...n] //array of integers for storing path count from 's' to i
TopologicallySort(G(V,E)) //here suppose 's' is at i0 and 't' is at i1 index
for i<-0 to n
if i<i0
C[i]<-0 //there is no path from vertex ordered on the left from 's' after the topological sort
if i==i0
C[i]<-1
for j<-0 to Adj(i)
C[i]<- C[i]+C[j]
return C[i1]
If you actually care about ordering your paths from shortest path to longest path then it would be far better to use a modified A* or Dijkstra Algorithm. With a slight modification the algorithm will return as many of the possible paths as you want in order of shortest path first. So if what you really want are all possible paths ordered from shortest to longest then this is the way to go.
If you want an A* based implementation capable of returning all paths ordered from the shortest to the longest, the following will accomplish that. It has several advantages. First off it is efficient at sorting from shortest to longest. Also it computes each additional path only when needed, so if you stop early because you dont need every single path you save some processing time. It also reuses data for subsequent paths each time it calculates the next path so it is more efficient. Finally if you find some desired path you can abort early saving some computation time. Overall this should be the most efficient algorithm if you care about sorting by path length.
import java.util.*;
public class AstarSearch {
private final Map<Integer, Set<Neighbor>> adjacency;
private final int destination;
private final NavigableSet<Step> pending = new TreeSet<>();
public AstarSearch(Map<Integer, Set<Neighbor>> adjacency, int source, int destination) {
this.adjacency = adjacency;
this.destination = destination;
this.pending.add(new Step(source, null, 0));
}
public List<Integer> nextShortestPath() {
Step current = this.pending.pollFirst();
while( current != null) {
if( current.getId() == this.destination )
return current.generatePath();
for (Neighbor neighbor : this.adjacency.get(current.id)) {
if(!current.seen(neighbor.getId())) {
final Step nextStep = new Step(neighbor.getId(), current, current.cost + neighbor.cost + predictCost(neighbor.id, this.destination));
this.pending.add(nextStep);
}
}
current = this.pending.pollFirst();
}
return null;
}
protected int predictCost(int source, int destination) {
return 0; //Behaves identical to Dijkstra's algorithm, override to make it A*
}
private static class Step implements Comparable<Step> {
final int id;
final Step parent;
final int cost;
public Step(int id, Step parent, int cost) {
this.id = id;
this.parent = parent;
this.cost = cost;
}
public int getId() {
return id;
}
public Step getParent() {
return parent;
}
public int getCost() {
return cost;
}
public boolean seen(int node) {
if(this.id == node)
return true;
else if(parent == null)
return false;
else
return this.parent.seen(node);
}
public List<Integer> generatePath() {
final List<Integer> path;
if(this.parent != null)
path = this.parent.generatePath();
else
path = new ArrayList<>();
path.add(this.id);
return path;
}
#Override
public int compareTo(Step step) {
if(step == null)
return 1;
if( this.cost != step.cost)
return Integer.compare(this.cost, step.cost);
if( this.id != step.id )
return Integer.compare(this.id, step.id);
if( this.parent != null )
this.parent.compareTo(step.parent);
if(step.parent == null)
return 0;
return -1;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Step step = (Step) o;
return id == step.id &&
cost == step.cost &&
Objects.equals(parent, step.parent);
}
#Override
public int hashCode() {
return Objects.hash(id, parent, cost);
}
}
/*******************************************************
* Everything below here just sets up your adjacency *
* It will just be helpful for you to be able to test *
* It isnt part of the actual A* search algorithm *
********************************************************/
private static class Neighbor {
final int id;
final int cost;
public Neighbor(int id, int cost) {
this.id = id;
this.cost = cost;
}
public int getId() {
return id;
}
public int getCost() {
return cost;
}
}
public static void main(String[] args) {
final Map<Integer, Set<Neighbor>> adjacency = createAdjacency();
final AstarSearch search = new AstarSearch(adjacency, 1, 4);
System.out.println("printing all paths from shortest to longest...");
List<Integer> path = search.nextShortestPath();
while(path != null) {
System.out.println(path);
path = search.nextShortestPath();
}
}
private static Map<Integer, Set<Neighbor>> createAdjacency() {
final Map<Integer, Set<Neighbor>> adjacency = new HashMap<>();
//This sets up the adjacencies. In this case all adjacencies have a cost of 1, but they dont need to.
addAdjacency(adjacency, 1,2,1,5,1); //{1 | 2,5}
addAdjacency(adjacency, 2,1,1,3,1,4,1,5,1); //{2 | 1,3,4,5}
addAdjacency(adjacency, 3,2,1,5,1); //{3 | 2,5}
addAdjacency(adjacency, 4,2,1); //{4 | 2}
addAdjacency(adjacency, 5,1,1,2,1,3,1); //{5 | 1,2,3}
return Collections.unmodifiableMap(adjacency);
}
private static void addAdjacency(Map<Integer, Set<Neighbor>> adjacency, int source, Integer... dests) {
if( dests.length % 2 != 0)
throw new IllegalArgumentException("dests must have an equal number of arguments, each pair is the id and cost for that traversal");
final Set<Neighbor> destinations = new HashSet<>();
for(int i = 0; i < dests.length; i+=2)
destinations.add(new Neighbor(dests[i], dests[i+1]));
adjacency.put(source, Collections.unmodifiableSet(destinations));
}
}
The output from the above code is the following:
[1, 2, 4]
[1, 5, 2, 4]
[1, 5, 3, 2, 4]
Notice that each time you call nextShortestPath() it generates the next shortest path for you on demand. It only calculates the extra steps needed and doesnt traverse any old paths twice. Moreover if you decide you dont need all the paths and end execution early you've saved yourself considerable computation time. You only compute up to the number of paths you need and no more.
Finally it should be noted that the A* and Dijkstra algorithms do have some minor limitations, though I dont think it would effect you. Namely it will not work right on a graph that has negative weights.
Here is a link to JDoodle where you can run the code yourself in the browser and see it working. You can also change around the graph to show it works on other graphs as well: http://jdoodle.com/a/ukx
find_paths[s, t, d, k]
This question is now a bit old... but I'll throw my hat into the ring.
I personally find an algorithm of the form find_paths[s, t, d, k] useful, where:
s is the starting node
t is the target node
d is the maximum depth to search
k is the number of paths to find
Using your programming language's form of infinity for d and k will give you all paths§.
§ obviously if you are using a directed graph and you want all undirected paths between s and t you will have to run this both ways:
find_paths[s, t, d, k] <join> find_paths[t, s, d, k]
Helper Function
I personally like recursion, although it can difficult some times, anyway first lets define our helper function:
def find_paths_recursion(graph, current, goal, current_depth, max_depth, num_paths, current_path, paths_found)
current_path.append(current)
if current_depth > max_depth:
return
if current == goal:
if len(paths_found) <= number_of_paths_to_find:
paths_found.append(copy(current_path))
current_path.pop()
return
else:
for successor in graph[current]:
self.find_paths_recursion(graph, successor, goal, current_depth + 1, max_depth, num_paths, current_path, paths_found)
current_path.pop()
Main Function
With that out of the way, the core function is trivial:
def find_paths[s, t, d, k]:
paths_found = [] # PASSING THIS BY REFERENCE
find_paths_recursion(s, t, 0, d, k, [], paths_found)
First, lets notice a few thing:
the above pseudo-code is a mash-up of languages - but most strongly resembling python (since I was just coding in it). A strict copy-paste will not work.
[] is an uninitialized list, replace this with the equivalent for your programming language of choice
paths_found is passed by reference. It is clear that the recursion function doesn't return anything. Handle this appropriately.
here graph is assuming some form of hashed structure. There are a plethora of ways to implement a graph. Either way, graph[vertex] gets you a list of adjacent vertices in a directed graph - adjust accordingly.
this assumes you have pre-processed to remove "buckles" (self-loops), cycles and multi-edges
You usually don't want to, because there is an exponential number of them in nontrivial graphs; if you really want to get all (simple) paths, or all (simple) cycles, you just find one (by walking the graph), then backtrack to another.
I think what you want is some form of the Ford–Fulkerson algorithm which is based on BFS. Its used to calculate the max flow of a network, by finding all augmenting paths between two nodes.
http://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm
There's a nice article which may answer your question /only it prints the paths instead of collecting them/.
Please note that you can experiment with the C++/Python samples in the online IDE.
http://www.geeksforgeeks.org/find-paths-given-source-destination/
I suppose you want to find 'simple' paths (a path is simple if no node appears in it more than once, except maybe the 1st and the last one).
Since the problem is NP-hard, you might want to do a variant of depth-first search.
Basically, generate all possible paths from A and check whether they end up in G.

Is it possible to get next element in the Stream?

I am trying to converting a for loop to functional code. I need to look ahead one value and also look behind one value. Is it possible using streams?
The following code is to convert the Roman text to numeric value.
Not sure if reduce method with two/three arguments can help here.
int previousCharValue = 0;
int total = 0;
for (int i = 0; i < input.length(); i++) {
char current = input.charAt(i);
RomanNumeral romanNum = RomanNumeral.valueOf(Character.toString(current));
if (previousCharValue > 0) {
total += (romanNum.getNumericValue() - previousCharValue);
previousCharValue = 0;
} else {
if (i < input.length() - 1) {
char next = input.charAt(i + 1);
RomanNumeral nextNum = RomanNumeral.valueOf(Character.toString(next));
if (romanNum.getNumericValue() < nextNum.getNumericValue()) {
previousCharValue = romanNum.getNumericValue();
}
}
if (previousCharValue == 0) {
total += romanNum.getNumericValue();
}
}
}
No, this is not possible using streams, at least not easily. The stream API abstracts away from the order in which the elements are processed: the stream might be processed in parallel, or in reverse order. So "the next element" and "previous element" do not exist in the stream abstraction.
You should use the API best suited for the job: stream are excellent if you need to apply some operation to all elements of a collection and you are not interested in the order. If you need to process the elements in a certain order, you have to use iterators or maybe access the list elements through indices.
I haven't see such use case with streams, so I can not say if it is possible or not. But when I need to use streams with index, I choose IntStream#range(0, table.length), and then in lambdas I get the value from this table/list.
For example
int[] arr = {1,2,3,4};
int result = IntStream.range(0, arr.length)
.map(idx->idx>0 ? arr[idx] + arr[idx-1]:arr[idx])
.sum();
By the nature of the stream you don't know the next element unless you read it. Therefore directly obtaining the next element is not possible when processing current element. However since you are reading current element you obiously know what was read before, so to achieve such goal as "accesing previous element" and "accessing next element", you can rely on the history of elements which were already processed.
Following two solutions are possible for your problem:
Get access to previously read elements. This way you know the current element and defined number of previously read elements
Assume that at the moment of stream processing you read next element and that current element was read in previous iteration. In other words you consider previously read element as "current" and currently processed element as next (see below).
Solution 1 - implemenation
First we need a data structure which will allow keeping track of data flowing through the stream. Good choice could be an instance of Queue because queues by their nature allows data flowing through them. We only need to bound the queue to the number of last elements we want to know (that would be 3 elements for your use case). For this we create a "bounded" queue keeping history like this:
public class StreamHistory<T> {
private final int numberOfElementsToRemember;
private LinkedList<T> queue = new LinkedList<T>(); // queue will store at most numberOfElementsToRemember
public StreamHistory(int numberOfElementsToRemember) {
this.numberOfElementsToRemember = numberOfElementsToRemember;
}
public StreamHistory save(T curElem) {
if (queue.size() == numberOfElementsToRemember) {
queue.pollLast(); // remove last to keep only requested number of elements
}
queue.offerFirst(curElem);
return this;
}
public LinkedList<T> getLastElements() {
return queue; // or return immutable copy or immutable view on the queue. Depends on what you want.
}
}
The generic parameter T is the type of actual elements of the stream. Method save returns reference to instance of current StreamHistory for better integration with java Stream api (see below) and it is not really required.
Now the only thing to do is to convert the stream of elements to the stream of instances of StreamHistory (where each next element of the stream will hold last n instances of actual objects going through the stream).
public class StreamHistoryTest {
public static void main(String[] args) {
Stream<Character> charactersStream = IntStream.range(97, 123).mapToObj(code -> (char) code); // original stream
StreamHistory<Character> streamHistory = new StreamHistory<>(3); // instance of StreamHistory which will store last 3 elements
charactersStream.map(character -> streamHistory.save(character)).forEach(history -> {
history.getLastElements().forEach(System.out::print);
System.out.println();
});
}
}
In above example we first create a stream of all letters in alphabet. Than we create instance of StreamHistory which will be pushed to each iteration of map() call on original stream. Via call to map() we convert to stream containing references to our instance of StreamHistory.
Note that each time the data flows through original stream the call to streamHistory.save(character) updates the content of the streamHistory object to reflect current state of the stream.
Finally in each iteration we print last 3 saved characters. The output of this method is following:
a
ba
cba
dcb
edc
fed
gfe
hgf
ihg
jih
kji
lkj
mlk
nml
onm
pon
qpo
rqp
srq
tsr
uts
vut
wvu
xwv
yxw
zyx
Solution 2 - implementation
While solution 1 will in most cases do the job and is fairly easy to follow, there are use cases were the possibility to inspect next element and previous is really convenient. In such scenario we are only interested in three element tuples (pevious, current, next) and having only one element does not matter (for simple example consider following riddle: "given a stream of numbers return a tupple of three subsequent numbers which gives the highest sum"). To solve such use cases we might want to have more convenient api than StreamHistory class.
For this scenario we introduce a new variation of StreamHistory class (which we call StreamNeighbours). The class will allow to inspect the previous and the next element directly. Processing will be done in time "T-1" (that is: the currently processed original element is considered as next element, and previously processed original element is considered to be current element). This way we, in some sense, inspect one element ahead.
The modified class is following:
public class StreamNeighbours<T> {
private LinkedList<T> queue = new LinkedList(); // queue will store one element before current and one after
private boolean threeElementsRead; // at least three items were added - only if we have three items we can inspect "next" and "previous" element
/**
* Allows to handle situation when only one element was read, so technically this instance of StreamNeighbours is not
* yet ready to return next element
*/
public boolean isFirst() {
return queue.size() == 1;
}
/**
* Allows to read first element in case less than tree elements were read, so technically this instance of StreamNeighbours is
* not yet ready to return both next and previous element
* #return
*/
public T getFirst() {
if (isFirst()) {
return queue.getFirst();
} else if (isSecond()) {
return queue.get(1);
} else {
throw new IllegalStateException("Call to getFirst() only possible when one or two elements were added. Call to getCurrent() instead. To inspect the number of elements call to isFirst() or isSecond().");
}
}
/**
* Allows to handle situation when only two element were read, so technically this instance of StreamNeighbours is not
* yet ready to return next element (because we always need 3 elements to have previos and next element)
*/
public boolean isSecond() {
return queue.size() == 2;
}
public T getSecond() {
if (!isSecond()) {
throw new IllegalStateException("Call to getSecond() only possible when one two elements were added. Call to getFirst() or getCurrent() instead.");
}
return queue.getFirst();
}
/**
* Allows to check that this instance of StreamNeighbours is ready to return both next and previous element.
* #return
*/
public boolean areThreeElementsRead() {
return threeElementsRead;
}
public StreamNeighbours<T> addNext(T nextElem) {
if (queue.size() == 3) {
queue.pollLast(); // remove last to keep only three
}
queue.offerFirst(nextElem);
if (!areThreeElementsRead() && queue.size() == 3) {
threeElementsRead = true;
}
return this;
}
public T getCurrent() {
ensureReadyForReading();
return queue.get(1); // current element is always in the middle when three elements were read
}
public T getPrevious() {
if (!isFirst()) {
return queue.getLast();
} else {
throw new IllegalStateException("Unable to read previous element of first element. Call to isFirst() to know if it first element or not.");
}
}
public T getNext() {
ensureReadyForReading();
return queue.getFirst();
}
private void ensureReadyForReading() {
if (!areThreeElementsRead()) {
throw new IllegalStateException("Queue is not threeElementsRead for reading (less than two elements were added). Call to areThreeElementsRead() to know if it's ok to call to getCurrent()");
}
}
}
Now, assuming that three elements were already read, we can directly access current element (which is the element going through the stream at time T-1), we can access next element (which is the element going at the moment through the stream) and previous (which is the element going through the stream at time T-2):
public class StreamTest {
public static void main(String[] args) {
Stream<Character> charactersStream = IntStream.range(97, 123).mapToObj(code -> (char) code);
StreamNeighbours<Character> streamNeighbours = new StreamNeighbours<Character>();
charactersStream.map(character -> streamNeighbours.addNext(character)).forEach(neighbours -> {
// NOTE: if you want to have access the values before instance of StreamNeighbours is ready to serve three elements
// you can use belows methods like isFirst() -> getFirst(), isSecond() -> getSecond()
//
// if (curNeighbours.isFirst()) {
// Character currentChar = curNeighbours.getFirst();
// System.out.println("???" + " " + currentChar + " " + "???");
// } else if (curNeighbours.isSecond()) {
// Character currentChar = curNeighbours.getSecond();
// System.out.println(String.valueOf(curNeighbours.getFirst()) + " " + currentChar + " " + "???");
//
// }
//
// OTHERWISE: you are only interested in tupples consisting of three elements, so three elements needed to be read
if (neighbours.areThreeElementsRead()) {
System.out.println(neighbours.getPrevious() + " " + neighbours.getCurrent() + " " + neighbours.getNext());
}
});
}
}
The output of this is following:
a b c
b c d
c d e
d e f
e f g
f g h
g h i
h i j
i j k
j k l
k l m
l m n
m n o
n o p
o p q
p q r
q r s
r s t
s t u
t u v
u v w
v w x
w x y
x y z
By StreamNeighbours class it is easier to track the previous/next element (because we have method with appropriate names), while in StreamHistory class this is more cumbersome since we need to manually "reverse" the order of the queue to achieve this.
As others stated, it's not feasable, to get next elements from within an iterated Stream.
If IntStream is used as a for loop surrogate, which merely acts as an index iteration provider, it's possible use its range iteration index just like with for; one needs to provide a means of skipping the next element on the next iteration, though, e. g. by means of an external skip var, like this:
AtomicBoolean skip = new AtomicBoolean();
List<String> patterns = IntStream.range(0, ptrnStr.length())
.mapToObj(i -> {
if (skip.get()) {
skip.set(false);
return "";
}
char c = ptrnStr.charAt(i);
if (c == '\\') {
skip.set(true);
return String.valueOf(new char[] { c, ptrnStr.charAt(++i) });
}
return String.valueOf(c);
})
It's not pretty, but it works.
On the other hand, with for, it can be as simple as:
List<String> patterns = new ArrayList();
for (char i=0, c=0; i < ptrnStr.length(); i++) {
c = ptrnStr.charAt(i);
patternList.add(
c != '\\'
? String.valueOf(c)
: String.valueOf(new char[] { c, ptrnStr.charAt(++i) })
);
}
EDIT:
Condensed code and added for example.

Building an All Paths Algorithm for a DAG

Been trying to build a method that gets all conceivable unique paths through a DAG. Went with recursion because it seemed like the easiest to understand. Ended up with this:
public class Brutus {
//the previous nodes visited
public ArrayList<Node> resultHistory = new ArrayList<Node>();
//Directed Graph class, contains a HashMap [adjacencies]
// that has keys for all nodes that contains all edges
public AdjacencyList paths;
//A list of all the pathways between nodes represented as a list of nodes
public ArrayList<ArrayList<Node>> allPaths = new ArrayList<ArrayList<Node>>();
public Brutus(AdjacencyList paths) {
this.paths = paths;
}
public ArrayList<ArrayList<Node>> findAll() {
int counter = 1;
for (Node key : paths.adjacencies.keySet()) {
System.out.println("[" + counter + "]: " + key.toString());
StringTokenizer st = new StringTokenizer(
paths.getAdjacentString(key), ",");
while (st.hasMoreTokens()) {
String child = st.nextToken();
if (paths.getNodeFromGraph(child) != null) {
resultHistory = new ArrayList<Node>();
resultHistory.add(key);
findPath(child, resultHistory);
}
}
counter++;
}
return allPaths;
}
public void findPath(String child, ArrayList<Node> resultHistory) {
if (resultHistory.contains(paths.getNodeFromGraph(child))) {
return;
}
resultHistory.add(paths.getNodeFromGraph(child));
if(!(inList(resultHistory, allPaths))) {
allPaths.add(resultHistory);
}
StringTokenizer st = new StringTokenizer(
paths.getAdjacentString(paths.getNodeFromGraph(child)), ",");
while (st.hasMoreTokens()) {
child = st.nextToken();
if (paths.getNodeFromGraph(child) != null) {
findPath(child, resultHistory);
}
}
}
public boolean inList(ArrayList<Node> resultHistory,
ArrayList<ArrayList<Node>> allPaths) {
for (int i = 0; i < allPaths.size();i++) {
if (allPaths.get(i).equals(resultHistory)) {
return true;
}
}
return false;
}
Problem is, I don't think it works for all paths, since I can't find certain paths inside it. Although as the dataset is 900 nodes, I am unable to find a pattern! Other questions on Stack seem to be somewhat more specialized and as such I attempted to build my own algorithm!
Can anyone either suggest a superior way to perform this, or tell me what I've done wrong?
If the algorithms correct, what would be the best way to withdraw all the paths between two nodes?
EDIT: I now realize that new paths don't get created from child nodes of the original, how would I make it so it does?
Here there is an implementation based on the BFS algorithm.
I will denote a path as a sequence of vertices l = (v, v', v'', ...) and I will perform the following two operations on it:
extend(l, v): puts vertex v at the end of list l;
v = back(l): gets the last vertex in list l.
FindPaths(G, v) {
// The first path is, simply, the starting node.
// It should be the first vertex in topological order.
pending_paths = {(v)};
while (pending_paths is not empty) {
l = pending_paths.remove_first(); // pop the first pending path
output(l); // output it (or save it in a list to be returned, if you prefer)
v = back(l); // Get the last vertex of the path
foreach(edge (v, v') in G) { // For each edge outgoing from v'...
// extend l with v' and put into the list of paths to be examined.
pending_paths.push_back(extend(l, v'));
}
}
}
Here's a simple recursive algorithm, expressed in pseudocode to avoid clouding the issue with lots of Java list manipulation:
AllPaths(currentNode):
result = EmptyList()
foreach child in children(node):
subpaths = AllPaths(child)
foreach subpath in subpaths:
Append(result, currentNode + subpath)
return result
Calling AllPaths on the root node will give you what you need, and you can improve the running time for nontrivial DAGs by caching the result of AllPaths on each node, so you only need to compute it once rather than once per distinct path that includes it.
So while #akappa's Pseudo was a good start, it took me awhile to understand how to make it work, if anyone else comes across this post here's how I did it:
public ArrayList<ArrayList<Node>> searchAll() {
try {
BufferedWriter out = new BufferedWriter(new FileWriter("output.txt"));
//Gets Nodes from Hashmap and puts them into Queue
for (Node key : paths.adjacencies.keySet()) {
queue.addToQueue(new QueueItem(key.chemName, new ArrayList<Node>()));
}
while (queue.getSize() > 0) {
QueueItem queueItem = queue.getFromQueue();
Node node = paths.getNodeFromGraph(queueItem.getNodeId());
if (node != null) {
findRelationAll(node, queueItem, out);
}
}
System.out.println("Cycle complete: Number of Edges: [" + resultHistoryAll.size() + "]");
out.close();
} catch (IOException e) {
}
return resultHistoryAll;
}
public void findRelationAll(Node node, QueueItem queueItem, BufferedWriter out) {
if (!foundRelation) {
StringTokenizer st = new StringTokenizer(paths.getAdjacentString(node), ",");
while (st.hasMoreTokens()) {
String child = st.nextToken();
ArrayList<Node> history = new ArrayList<Node>();
//Gets previous Nodes
history.addAll(queueItem.getHistoryPath());
//Makes sure path is unique
if (history.contains(node)) {
System.out.println("[" + counter2 + "]: Cyclic");
counter2++;
continue;
}
history.add(node);
resultHistory = history;
queue.addToQueue(new QueueItem(child, history));
if (!(inList(resultHistory, resultHistoryAll))) {
resultHistoryAll.add(resultHistory);
try {
out.write("[" + counter + "]: " + resultHistory.toString());
out.newLine();
out.newLine();
} catch (IOException e) {
}
System.out.println("[" + counter + "]: " + resultHistory.toString());
counter++;
} else {
System.out.println("[" + counter3 + "]: InList");
counter3++;
}
}
}
}
//This checks path isn't in List already
public boolean inList(ArrayList<Node> resultHistory, ArrayList<ArrayList<Node>> allPaths) {
for (int i = 0; i < allPaths.size(); i++) {
if (allPaths.get(i).equals(resultHistory)) {
return true;
}
}
return false;
}
}
The above code does a few extra things that you might not want:
It writes pathways to a text file as a list of nodes + it's counter value.
Makes sure the path doesn't cross the same node twice
Makes sure no two pathways are the same in the final list (in normal circumstances this is unnecessary work)
The QueueItem object is just a way to store the previously visited nodes. It's part of nemanja's code, which is what my code was based off.
Hat tip to him, akappa (for the most efficient answer), and jacobm (for finding a solution like my original code, and explaining it's limitations).
Incase anyone's actually interested in the work; I'm currently processing over 5 million pathways, of which 60,000 are unique pathways between 900 chemicals. And that's just 1,2,3 or 4 chemical pathways... Biology is complicated.
EDIT and Warning: IF anyone is handling huge reams of data like me, windows 7 - or at least my machine - throws a shit fit and closes the program after an ArrayList > 63,000 objects, regardless of how you arrange the pointers. The solution I started with was after 60,000 objects, restarting the list while adding everything to CSV. This led to some duplicates between list iteration, and should ultimately be surpassed by my moving to linux tomorrow!

Intersection of 2 binary search trees

Hey, So I want to create a new tree which is basically the intersection (mathematical definition of intersection) of 2 given binary search trees. I have a method that prints out all the nodes at a particular level of the tree and I have a method that can find out the depth of the tree.I am pasting my work so far though it is incomplete and I'm stuck with the logic.Help will be appreciated.
public static Bst<Customer> intersect (Bst<Customer> a, Bst<Customer> b){
Bst<Customer> result = new Bst<Customer>();
BTNode<Customer> cur1;
BTNode<Customer> cur2;
BTNode<Customer> cur3;
cur1=a.root;
cur2=b.root;
cur3=result.root;
int Resultdepth;
if(a.maxDepth()<b.maxDepth())
Resultdepth=a.maxDepth();
else
Resultdepth=b.maxDepth();
if(cur1==null || cur2==null){ // Handeling the root case intially
result = null;
}
else
cur3.item.set_account_id(cur1.item.get_accountid()+ cur2.item.get_accountid());
cur1=cur1.left;
cur2=cur2.left;
cur3=cur3.left;
while(<some check>){
}
return result;
}
public int maxDepth(){
return mD(root);
}
int mD(BTNode<E> node){
if (node==null) {
return(0);
}
else {
int lDepth = mD(node.left);
int rDepth = mD(node.right);
// use the larger + 1
return(Math.max(lDepth, rDepth) + 1);
}
}
// for printing the nodes at a particular level and giving the starting level
public void PrintAt(BTNode<E> cur, int level, int desiredLevel) {
if (cur == null) {
return;
}
if (level == desiredLevel) {
System.out.print(cur.item.toString() + "");
}
else {
PrintAt(cur.left, level+1, desiredLevel);
PrintAt(cur.right, level+1, desiredLevel);
}
}
You have to traversal both trees in order at the same time and "in sync".
I'd suggest to implement the Iterable interface for your class. Then you get the first values from both trees. If they are equal, put it in the new tree, and get the next values from both iterators. If not, iterate the iterator with the smaller values until the value you get is as least as big as the last value from the other iterator. Rinse and repeat.
The intersection of two trees is presumably the nodes that are in both trees?
Given that you'll have to explore the tree to do this, why not just do an in-order traversal, store the nodes and then do an intersection operation on ordered lists?
My suggestion for such an intersection is simple:
Given tree A and tree B, to find tree C = A \intersect B:
1: Copy either tree A or B. Let us assume A for clarity.
This copy is now your tree C. Now let's 'trim' it.
2: For c = C.root_node and b = B.root_node:
if b==c,
Repeat the procedure with nodes b.left, c.left
Repeat the procedure with nodes b.right, c.right
else,
Remove c (thereby removing all subsequent children, it is implied they are unequal)
If this implementation would work, it would avoid the use of iterators and the like, and boil down to a simple recursive traversal.
(Like this!)
Ask if you would like further clarification.
Regards.
For the recursive implementation of finding intersection of two binary search trees , I came up with the following code. I am not very sure of the time complexity, but it does work all right.
void BST::findIntersection(cell *root1, cell * root2) {
if(root1 == NULL ) {
// cout<<"Tree 1 node is null , returning"<<endl;
return;
}
if(root2 == NULL) {
// cout<<"Tree 2 node is null , returning"<<endl;
return;
}
//cout<<"Comparing tree 1 : "<<root1->data<< " and tree 2 : " << root2->data<<endl;
if(root1->data==root2->data) {
// cout<<"tree 1 equal to tree 2 "<<endl;
insert(root1->data);
// cout<<"Inserting in new tree : "<<root1->data<<endl;
findIntersection(root1->left,root2->left);
findIntersection(root1->right, root2->right);
}
else if(root1->data>root2->data) {
// cout<<"tree 1 > tree 2 "<<endl;
findIntersection(root1,root2->right);
findIntersection(root1->left, root2);
}
else {
// cout<<"tree 1 < tree 2 "<<endl;
findIntersection(root1->right,root2);
findIntersection(root1, root2->left);
}
}

Slow implementation and runs out of heap space (even when vm args are set to 2g)

I'm writing a function which generates all paths in a tree as xpath statements and storing them in a bag below is a naive (sorry this is long) and below that is my attempt to optimize it:
/**
* Create the structural fingerprint of a tree. Defined as the multiset of
* all paths and their multiplicities
*/
protected Multiset<String> createSF(AbstractTree<String> t,
List<AbstractTree<String>> allSiblings) {
/*
* difference between unordered and ordered trees is that the
* next-sibling axis must also be used
*
* this means that each node's children are liable to be generated more
* than once and so are memo-ised and reused
*/
Multiset<String> res = new Multiset<String>();
// so, we return a set containing:
// 1. the node name itself, prepended by root symbol
res.add("/" + t.getNodeName());
List<AbstractTree<String>> children = t.getChildren();
// all of the childrens' sets prepended by this one
if (children != null) {
for (AbstractTree<String> child : children) {
Multiset<String> sub = createSF(child, children);
for (String nextOne : sub) {
if (nextOne.indexOf("//") == 0) {
res.add(nextOne);
} else {
res.add("/" + nextOne);
res.add("/" + t.getNodeName() + nextOne);
}
}
}
}
// 2. all of the following siblings' sets, prepended by this one
if (allSiblings != null) {
// node is neither original root nor leaf
// first, find current node
int currentNodePos = 0;
int ptrPos = 0;
for (AbstractTree<String> node : allSiblings) {
if (node == t) {
currentNodePos = ptrPos;
}
ptrPos++;
}
// 3. then add all paths deriving from (all) following siblings
for (int i = currentNodePos + 1; i < allSiblings.size(); i++) {
AbstractTree<String> sibling = allSiblings.get(i);
Multiset<String> sub = createSF(sibling, allSiblings);
for (String nextOne : sub) {
if (nextOne.indexOf("//") == 0) {
res.add(nextOne);
} else {
res.add("/" + nextOne);
res.add("/" + t.getNodeName() + nextOne);
}
}
}
}
return res;
}
And now the optimization which is (currently) in a subclass:
private Map<AbstractTree<String>, Multiset<String>> lookupTable = new HashMap<AbstractTree<String>, Multiset<String>>();
public Multiset<String> createSF(AbstractTree<String> t,
List<AbstractTree<String>> allSiblings) {
Multiset<String> lookup = lookupTable.get(t);
if (lookup != null) {
return lookup;
} else {
Multiset<String> res = super.createSF(t, allSiblings);
lookupTable.put(t, res);
return res;
}
}
My trouble is that the optimized version runs out of heap space (the vm args are set at -Xms2g -Xmx2g) and is very slow on moderately large input. Can anyone see a way to improve on this?
Run the code through a profiler. That's the only way to get real facts about the code. Everything else is just guesswork.
"generates all paths in a tree as xpath statements"
How many paths are you creating? This can be non-trivial. The number of paths should be O( n log n ), but the algorithm could be much worse depending on what representation they use for children of a parent.
You should profile the simple enumeration of paths without worrying about the bag storage.
Your code eats RAM exponentially. So one layer more means children.size() times more RAM.
Try to use a generator instead of materializing the results: Implement a Multiset which does not calculate the results beforehand but iterates through the tree structure as you call next() on the set's iterator.

Categories