I have a tree represented with the library jgrapht, there are variuous type of nodes I need to cut any subtree starting from a particulare node type.
As you can see in this example, this tree represent a source code of a Java class. I need to create multiple jgrapht objects by splitting the main tree starting for each "Entry" node type. In total I should get 7 tree from this big one. The structure I use is a DirectedPseudograph.
Although I'm not 100% clear about what you want, it seems there are various solution approaches.
Starting from every outgoing neighbor of the root node, you could run a depth first search and record the nodes returned. The nodes reachable by the DFS algorithm belong to the same subtree. For this you can use the DepthFirstIterator
You could create a subgraph without the root node, for instance by using the AsSubgraph class. You can then invoke the ConnectivityInspector on the resulting induced subgraph. Since every subtree is a disconnected graph component, the connectivity inspector will be able to find each of these components.
Btw, unless you need the capabilities of a Pseudograph, for performance it would be better to use the SimpleDirectedGraph. Obviously, the latter does not allow parallel edges or self-loops.
Related
I need to store a reasonably large Direct Acyclic Graph in Java (order of 100,000 nodes, depth between 7 and 20, irregular shaped, average depth 13).
What would be the best-performing data structure(s) to store it if the predominant operation I need after building the data structure is:
99% operations: Find a full set of accendant paths (from the root down to a given node)
1% operations: Find all children, or more often, all ancestors, of a given node.
As can be obvious, I'd like the first operation to be O(1) if possible, as opposed to O(Average-Depth)
Please note that for the purposes of this question, the data structure is write-once: after I build it from a list of nodes and vertices, the graph topology will never change.
My naive implementation would be to store it as a combination of:
HashMap<Integer, Integer[]> childrenPerParent;
HashMap<Integer, Integer[]> ascendantPaths;
E.g. I store, for each node: a list of children of that node; and separately, a set of paths to the root from that node.
Downside: This seems very wasteful as far as space (we basically store each of the inner graph nodes multiples of multiples of times in the ascendantPaths - e.g. given size estimates, we would store extra 100,000 * 13 = 1,3Million node copies in ascendantPaths, each of which is an object to be created and stored )
I would recommend using Neo4J. It's a graph database implemented in Java with a lot of low-level optimizations (e.g., node and edge attributes are stored in their own blocks so that node identities and their edges can be packed), and it mmaps the on-disk database. Following an edge is independent of the number of nodes in the graph or edges on the origin.
I have a tree of several thousand nodes, decorated by boolean attributes, something like this (attributes in parentheses):
Root (x=true, y=true, z=false)
Interior 1
Leaf 1 (x=false, z=false)
Leaf 2 (x=false, y=false, z=false)
Interior 2
Leaf 3
etc.
What I would like to do is find the smallest number of decorations necessary to preserve the values of the attributes, given the following constraints/info:
Attributes are inherited by child nodes
Only the resulting attributes of the leaf nodes are important (including inherited attributes). So if setting a "default" attribute on an interior node lets me drop a bunch of attributes on its children, that's okay.
There is a shorthand in our model for setting all attributes to either true or false. For example, (x=false,y=false,z=false) can be represented by one decorator, whereas (x=false,y=false,z=true) would take three.
The number of child nodes will greatly outnumber the interior nodes (at least 25 to 1)
The initial state of the tree will have many redundancies.
I'm using Java and adding an external lib to deal with this isn't a big deal.
These constraints are not flexible as I'm working on an integration layer with a Large Enterprise System, so all I can do is try to minimize the number of attribute values we have to store and transit.
I think constraint #3 is throwing me for a loop, because without it I could just deal with each attribute individually, which is simple (and I already implemented a solution to that before I realized more attributes were coming).
I hope this is descriptive enough to give a picture of the general problem. I can give more examples or information if required. Thank you!
I think (3.) can be mainly ignored because we'd only be interested in it for leaves.
Here's what I would suggest:
for every leaf with all booleans one way, use the shortcut (3.).
Then for every internal node, assign attributes to the majority value for leaves below, not handled by 1, and remove the now redundant assignments.
For higher internal nodes, do the same, looking at immediate children, up to the root.
This is a heuristic, and I haven't tried it, but would be my first shot if I were you.
Let me know how it goes.
Let's say I wanted all nodes whose parent(s) matched some certain condition.
Is there an accepted way of doing this other than inspecting each node and building a results object full of either nodes or subtrees?
If the tree is not in already sorted or indexed based on the search condition in some way, then you cannot prune the tree traversal (i.e. you cannot decide to not take the right child at some particular node, for instance). Therefore, you have no choice but to traverse the entire tree.
That's pretty much it. You simply have to access each node to see whether it matches the criteria.
But there are some ways to speed it up:
Use an index. If you are repeatedly querying the same property, it might be beneficial to create an index on that property and use for searching. This could speed up your code immensely. Doing is not free though: you need to calculate the index up front, update it every time you update the tree and you need more memory to keep it.
If you have a multi-core machine, you can process individual subtrees in parallel by using separate threads.
Hi
I have a question that I have written a code such a merge sort that we can have a binary decision tree for that .but when i want to merge those elements I do not need those external nodes that has just one element in it! so what should I do with them? I should return them?
I'd say you remove the parent because your tree at that point degrades to a list. If the action at the leaf depends on data at the parent then there's not much you can do unless you can merge actions into one.
I'm writing a Java Tree in which tree nodes could have children that take a long time to compute (in this case, it's a file system, where there may be network timeouts that prevent getting a list of files from an attached drive).
The problem I'm finding is this:
getChildCount() is called before the user specifically requests opening a particular branch of the tree. I believe this is done so the JTree knows whether to show a + icon next to the node.
An accurate count of children from getChildCount() would need to perform the potentially expensive operation
If I fake the value of getChildCount(), the tree only allocates space for that many child nodes before asking for an enumeration of the children. (If I return '1', I'll only see 1 child listed, despite that there are more)
The enumeration of the children can be expensive and time-consuming, I'm okay with that. But I'm not okay with getChildCount() needing to know the exact number of children.
Any way I can work around this?
Added: The other problem is that if one of the nodes represents a floppy drive (how archaic!), the drive will be polled before the user asks for its files; if there's no disk in the drive, this results in a system error.
Update: Unfortunately, implementing the TreeWillExpand listener isn't the solution. That can allow you to veto an expansion, but the number of nodes shown is still restricted by the value returned by TreeNode.getChildCount().
http://java.sun.com/docs/books/tutorial/uiswing/components/tree.html#data
scroll a little down, there is the exact tutorial on how to create lazy loading nodes for the jtree, complete with examples and documentation
I'm not sure if it's entirely applicable, but I recently worked around problems with a slow tree by pre-computing the answers to methods that would normally require going through the list of children. I only recompute them when children are added or removed or updated. In my case, some of the methods would have had to go recursively down the tree to figure out things like 'how many bytes are stored' for each node.
If you need a lot of access to a particular feature of your data structure that is expensive to compute, it may make sense to pre-compute it.
In the case of TreeNodes, this means that your TreeNodes would have to store their Child count. To explain it a bit more in detail: when you create a node n0 this node has a childcount (cc) of 0. When you add a node n1 as a child of this one, you n1.cc + cc++.
The tricky bit is the remove operation. You have to keep backlinks to parents and go up the hierarchy to subtract the cc of your current node.
In case you just want to have the a hasChildren feature for your nodes or override getChildCount, a boolean might be enough and would not force you to go up the whole hierarchy in case of removal. Or you could remove the backlinks and just say that you lose precision on remove operations. The TreeNode interface actually doesn't force you to provide a remove operation, but you probably want one anyway.
Well, that's the deal. In order to come up with precomputed precise values, you will have to keep backlinks of some sorts. If you don't you'd better call your method hasHadChildren or the more amusing isVirgin.
There are a few parts to the solution:
Like Lorenzo Boccaccia said, use the TreeWillExpandListener
Also, need to call nodesWereInserted on the tree, so the proper number of nodes will be displayed. See this code
I have determined that if you don't know the child count, TreeNode.getChildCount() needs to return at least 1 (it can't return 0)