Algorithm for finding objects with varying degrees of specificity

Algorithm for finding objects with varying degrees of specificity - java

I have a large number of objects arranged in a tree-like structure (each node on the tree has parents and children, starting with one master node, and ending in many child nodes). Each object has it's own ID in the form of a string, and there are many duplicate IDs, but no duplicates sunder the same parent. Example:
ParentA:
childA
childB
childD
ParentB:
childA
childC
childD
The tree is also many layers deep.
I need a method of finding objects that will work like this (example is based on the previous list):
Example 1:
an ArrayList with the string {"childB"} is passed to the algorithm
there are no duplicate nodes with an ID of "childB", so a refrence to childB is returned
Example 2:
an ArrayList with the strings {"parentA", "childD"} is passed to the algorithm
there are no duplicate nodes with an ID of "childD" AND a parent with an ID of "parentA", so a reference to the given node is returned
Example 3:
an ArrayList with the string {"childD"} is passed to the algorithm
there are duplicate nodes with an ID of "childD" so the algorithm requests for more information (the name of the parent(s))
Keep in mind that there may be many levels of specificity, like {"nodeA", "nodeD", "nodeX", "nodeD"} so some kind of loop, or maybe a recursive method would be needed.
So, any ideas?
Update:
I created a depth-first-search algorithm to go through each node on the tree, and it works very well. The algorithm returns all nodes in the form of one ArrayList All I need now is a way to select one based on varying degrees of specificity. Can anyone help with that?
The above three examples show what I need.

depth-first search algorithm may be helpful for you!!!

Related

Cloning references?

This may be a too-specific question, but I'm hoping there's a more general solution to my problem.
I have a class. In this class is a tree-type structure with many parent/child nodes. Also in this class is an Array filled with references to each node in this tree-type structure.
The tree's purpose is for each node to know where to draw itself on screen (every node has relative positional information based on its parent's location).
The Array's purpose is the draw order, I simply draw whatever node is referenced at Array[0] first and so on. (So, the nodes aren't being drawn in the order they appear in the tree necessarily).
My problem is this. I want to clone this overall class that contains these two objects (tree with nodes and an Array that references said nodes). This seems simple.
I create a deep copy of the tree structure and the nodes it contains. Cool.
However, I don't know how to repopulate a new Array with references to these new nodes in this new tree. It seems like it would be simple but I can't figure out how to do it.
I tried to be specific enough to give enough information without being too confusing, hope you understand.
Thanks.

If you're able to change the node data structure, you could add a field for the node's array index. That way, once you've rebuilt your tree you can just walk through it and use the index field to repopulate the array. Not super elegant, but it gets the job done...
Or, to avoid adding a field to your node class, I suppose you could use a temporary hashtable that maps nodes to array indices. Walk through your source array to populate the hashtable, then once you've cloned the tree, walk through the tree, looking up the new nodes in the hashtable (which will work fine assuming you've implemented equals and hashCode properly) and populating the array from those.

Test depth-first tree

I made a Java program to browse a tree with depth-first. The program is correct, but the choice of the son of a node is random. for example in this tree :
sometimes, the result is:
A-B-E-C-F-D
A-C-F-D-B-E
A-B-E-D-C-F
I want to make test (unit testing) of this program but i I have no idea, how i can do it? please
I thought to do a List that contains the elements and compare the elements of the list with the result of my depth-first tree, but the result of my depth-first is randomly. Then I can not compare it with the elements of the List.

There are 2 properties you want to test for:
Each node visited exactly one
Traversal is depth first
The first is easy to test: the number of unique nodes visited must be equal to the number of nodes in the tree. Can test that against any random tree.
The second is slightly trickier - expressing it in the general case is probably more complex than the tested code. Easier just to pick some representative constraints based on the specific known data, i.e.
B must be after A
E must be immediately after B
...
Hard to conceive of realistic code that satisfies the first property for all trees, but would fail the second only in specific cases. So outside of the most formal of safety critical systems (and what are they doing using dynamic data structures anyway?), that's going to be enough.

I haven't clicked on your link, but if the code is truly random and is intended to be, then you should make your unit test so that it says "given this input, then the output must be one of these three things". This isn't ideal because it might take many, many runs before a bug shows up (i.e. the first few times you run it, it might just randomly mask the bug), but I suspect it's the best you can do for testing an algorithm with random behaviour.

This means that the order of the children of each node is not deterministic. You probably used a Set to hold children. Consider using a LinkedHashSet (which preserves insertion order) or a SortedSet (which sorts children). This way, the order will always be the same.
If randomness is a feature of your tree and you want to keep it as is, then see the other answers, or change the algorithm itself to make sure you always sort the children while traversing the tree.

Choose a data set for the unit test that has just a few valid results (it should however have more than one, obviously), and test whether the result is one of them.
Alternatively, you could try to impose a well-defined order on the nodes (e.g. by alphabetically sorting the children of each node, instead of managing them in a Set)

How to build in Java a Weighted Directed Acyclic Graph

I did a search on similar topics, but the answers are too vague for my level of understanding and comprehension, and I don't think they're specific enough to my question.
Similar threads:
Tree (directed acyclic graph) implementation
Representing a DAG (directed acyclic graph)
Question:
I have formatted a text file which contains data of the following format...
Example dataset:
GO:0000109#is_a: GO:0000110#is_a: GO:0000111#is_a: GO:0000112#is_a: GO:0000113#is_a: GO:0070312#is_a: GO:0070522#is_a: GO:0070912#is_a: GO:0070913#is_a: GO:0071942#part_of: GO:0008622
GO:0000112#part_of: GO:0000442
GO:0000118#is_a: GO:0016581#is_a: GO:0034967#is_a: GO:0070210#is_a: GO:0070211#is_a: GO:0070822#is_a: GO:0070823#is_a: GO:0070824
GO:0000120#is_a: GO:0000500#is_a: GO:0005668#is_a: GO:0070860
GO:0000123#is_a: GO:0005671#is_a: GO:0043189#is_a: GO:0070461#is_a: GO:0070775#is_a: GO:0072487
GO:0000126#is_a: GO:0034732#is_a: GO:0034733
GO:0000127#part_of: GO:0034734#part_of: GO:0034735
GO:0000133#is_a: GO:0031560#is_a: GO:0031561#is_a: GO:0031562#is_a: GO:0031563#part_of: GO:0031500
GO:0000137#part_of: GO:0000136
I'm looking to construct a weighted directed DAG from this data (the above is just a snippet). The whole dataset of 106kb is here: Source
--------------------------------------------------
Taking into consideration line-by-line, the data of each line is explained as follows...
First line as an example:
GO:0000109#is_a: GO:0000110#is_a: GO:0000111#is_a: GO:0000112#is_a: GO:0000113#is_a: GO:0070312#is_a: GO:0070522#is_a: GO:0070912#is_a: GO:0070913#is_a: GO:0071942#part_of: GO:0008622
'#' is the delimeter/tokenizer for the line data.
The First term, GO:0000109 is the node name.
The subsequent terms of is_a: GO:xxxxxxx OR part_of: GO:xxxxxxx are the nodes which are connected to GO:0000109.
Some of the subsequent terms have connections too, as depicted in the dataset.
When it is is_a, the weight of the edge is 0.8.
When it is part_of, the weight of the edge is 0.6.
--------------------------------------------------
I have Google-d on how DAGs are, and I understand the concept. However, I still have no idea how to put it into code. I'm using Java.
From my understanding, a graph generally consists of nodes and arcs. Does this graph require an adjacency list to determine the direction of the connection? If so, I'm not sure how to combine the graph and adjacency list to communicate with each other.
After constructing the graph, my secondary goal is to find out the degree of each node from the root node. There is a root node in the dataset.
For illustration, I have drawn out a sample of the connection of the first line of data below:
Image Link
I hope you guys understand what I'm trying to achieve here. Thanks for looking through my problem. :)

Because it's easier to think about, I'd prefer to represent it as a tree. (Also makes it easier to traverse the map and keep intermediate degrees.)
You could have a Node class, which would have a Collection of child Node objects. If you must, you could also represent the child relationships as a Relationship object, which would have both a weight and a Node pointer, and you could store a Collection of Relationship objects.
Then you could do a walk on the tree starting from the root, and mark each visited node with its degree.
class Node{
String name;
List<Relationship> children;
}
class Relationship{
Node child;
double weight;
}
class Tree{
Node root;
}
Here, Tree should probably have a method like this:
public Node findNodeByName(String name);
And Node should probably have a method like this:
public void addChild(Node n, double weight);
Then, as you parse each line, you call Tree.findNodeByName() to find the matching node (and create one if none exists... but that shouldn't happen, if your data is good), and append the subsequent items on the line to that node.
As you've pointed out, DAGs cannot really be converted to trees, especially because some nodes have multiple parents. What you can do is insert the same node as the child of multiple parents, perhaps using a hash table to decide if a particular node has been traversed or not.

Reading the comments, you seem confused by how a Node can contain Relationships which each in turn contains a Node. This is quite a common strategy, it is in general called the Composite pattern.
The idea in the case of trees is that the tree can be thought of as consisting of multiple subtrees - if you were to disconnect a node and all its ancestors from the tree, the disconnected nodes would still make a tree, though a smaller one. Thus, a natural way to represent a tree is to have each Node contain other Nodes as children. This approach lets you do many things recursively, which in the case of trees is often, again, natural.
Letting a Node keep track of its children and no other parts of the tree also emulates the mathematical directed graph - each vertex is "aware" only of its edges and nothing else.
Example recursive tree implementation
For instance, to search for an element in a binary search tree, you would call the root's search method. The root then checks whether the sought element is equal, less or greater than itself. If it is equal, the search exits with an appropriate return value. If it is less or greater, the root would instead call search on the left or right child, respectively, and they would do exactly the same thing.
Analogously, to add a new Node to the tree, you would call the root's add method with the new node as a parameter. The root decides whether it should adopt the new node or pass it on to one of its children. In the latter case, it would select a child and call its add method with the new Node as a parameter.

How do I create a binary tree with two int values?

I'm trying to create binary tree that contains two int values and one string value sorted in the lexicographic, but I'm not sure what to do. I've created an array list, which has been already sorted, but the binary tree has to be a reference-based which is not sorted and I'm thinking about sorting the list while creating it. Can any one help with this? Any brief idea would be appreciated.

Binary tree is a recursive thing. Make a class called BinaryTree (i hope you are in C++, or .NET or JAVA) that has two references to two other BinaryTrees (null by default). Then make an insert function that is recursive.
I don't know what you are trying to accomplish, but when building a binary tree, arrays are usually nowhere to be found.

You first should create a class to store your data and implement Comparable or use a Comparator.
public class Data { // Implement Comparable...
private String s;
private int n1;
private int n2;
// Implement constructors, getters, setters based on what you need...
// Implement compareTo (+ equals + hashCode) unless your going with Comparator
}
Then use a Collection that implements SortedSet to store your data, TreeSet is a good choice. The objects in the SortedSet are stored by reference so if you modify a value set in a local variable it will be modified in the collection as well.
Edit: If I understood your question about reference based lists correctly the following is possible in Java.
List<Data> dataList = // Create list and add data into it.
Data data = dataList.get(4);
data.setS(103); // Modifies S in the local data-object and in dataList because they are reference based.

It sounds like you already have a data structure to store your two int values and a string (since you have them sorted in an array list). You can include this data structure in a "tree node". A node typically has a reference pointer to a parent node (unless it is the root node) and 2 child nodes.
Since you want the tree to be sorted what you're really after is a special form of binary tree called a heap. The link to the Binary Heap wikipedia page below has an algorithm to show how to sort a binary heap.
http://en.wikipedia.org/wiki/Binary_heap
Here's some more general information on heaps and trees.
http://en.wikipedia.org/wiki/Binary_tree
http://en.wikipedia.org/wiki/Heap_(data_structure)
EDIT: You don't have to use a literal tree structure to store the your data in a tree form. It is perfectly acceptable to build a tree using an array. Instead of using reference pointers (parent and 1 or 2 child nodes) you can compute an index into the array. Each set of children is considered a "row" in the tree. The root element is on the zero row. It's two children are on the first row. The children of the root's children are on the second row, and so on.
Using this pattern the children of any node can be found using array[2*n+1] and array[2*n+2] where n is the row of the parent node. The parent of any node can be found by using array[floor( (n-1)/2)].

Efficient way to walk through an object tree?

I've got a set of TreeNodes, each of which has an id, a Collection of parent nodes, and a collection of child nodes.
For a given node Id, I'm looking for an efficient way to generate all the links that pass through that node. So in short, start at the node, and iterate through all its children. If a node has more than one child, create a link for each child. The traverse the children etc..
I'd also like to be able to do this in an 'upward' direction, through the parent nodes.
Is there a simple algorithm to do this?
EDIT: Oh, and I'd like to be able to output the id's of all the nodes in a given chain...

You are looking for a Breadth First or Depth First Search. At first it is not more than the following (this is depth first search).
Visit(Node node)
{
foreach (Node childNode in node.Children)
{
Visit(childNode);
}
DoStuff(node);
}
The problem is that the graph may contain cycles, hence the algorithm will enter infinite loops. To get around this you must remember visited nodes by flaging them or storing them in a collection. If the graph has no cycles - for example if it is a tree - this short algorithm will already work.
And by the way, if a TreeNode has multiple parents, it's not a tree but a graph node.

Well, if the nodes have a reference to the parent, it's simple as getting the parent recursively (once in a tree, each node has only one (or none at all, if it is a root) parent.
If there's no such reference, than you could use a breadth-first search, for instance, having as initial set your collection of parent nodes.
-- EDIT --
Once a node may have more than one parent, then you're dealing with a graph. There are also graph traversal algorithms (see table at the side).
Make sure that, if your graph has a cycle, you won't end up having a infinite loop

You might want to check out depthFirstEnumeration() and breadthFirstEnumeration() on DefaultMutableTreeNode. However, this doesn't solve your problem of wanting to navigate the tree in a bottom-up fashion.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.