Reducing tree of attributes - java

I have a tree of several thousand nodes, decorated by boolean attributes, something like this (attributes in parentheses):
Root (x=true, y=true, z=false)
Interior 1
Leaf 1 (x=false, z=false)
Leaf 2 (x=false, y=false, z=false)
Interior 2
Leaf 3
etc.
What I would like to do is find the smallest number of decorations necessary to preserve the values of the attributes, given the following constraints/info:
Attributes are inherited by child nodes
Only the resulting attributes of the leaf nodes are important (including inherited attributes). So if setting a "default" attribute on an interior node lets me drop a bunch of attributes on its children, that's okay.
There is a shorthand in our model for setting all attributes to either true or false. For example, (x=false,y=false,z=false) can be represented by one decorator, whereas (x=false,y=false,z=true) would take three.
The number of child nodes will greatly outnumber the interior nodes (at least 25 to 1)
The initial state of the tree will have many redundancies.
I'm using Java and adding an external lib to deal with this isn't a big deal.
These constraints are not flexible as I'm working on an integration layer with a Large Enterprise System, so all I can do is try to minimize the number of attribute values we have to store and transit.
I think constraint #3 is throwing me for a loop, because without it I could just deal with each attribute individually, which is simple (and I already implemented a solution to that before I realized more attributes were coming).
I hope this is descriptive enough to give a picture of the general problem. I can give more examples or information if required. Thank you!

I think (3.) can be mainly ignored because we'd only be interested in it for leaves.
Here's what I would suggest:
for every leaf with all booleans one way, use the shortcut (3.).
Then for every internal node, assign attributes to the majority value for leaves below, not handled by 1, and remove the now redundant assignments.
For higher internal nodes, do the same, looking at immediate children, up to the root.
This is a heuristic, and I haven't tried it, but would be my first shot if I were you.
Let me know how it goes.

Related

Split a tree in a forset using jgrapht

I have a tree represented with the library jgrapht, there are variuous type of nodes I need to cut any subtree starting from a particulare node type.
As you can see in this example, this tree represent a source code of a Java class. I need to create multiple jgrapht objects by splitting the main tree starting for each "Entry" node type. In total I should get 7 tree from this big one. The structure I use is a DirectedPseudograph.
Although I'm not 100% clear about what you want, it seems there are various solution approaches.
Starting from every outgoing neighbor of the root node, you could run a depth first search and record the nodes returned. The nodes reachable by the DFS algorithm belong to the same subtree. For this you can use the DepthFirstIterator
You could create a subgraph without the root node, for instance by using the AsSubgraph class. You can then invoke the ConnectivityInspector on the resulting induced subgraph. Since every subtree is a disconnected graph component, the connectivity inspector will be able to find each of these components.
Btw, unless you need the capabilities of a Pseudograph, for performance it would be better to use the SimpleDirectedGraph. Obviously, the latter does not allow parallel edges or self-loops.

Graph algorithm to find the most likely ancestor of a node

I'm working on the Wikipedia Category Graph (WCG). In the WCG, each article is associated to multiple categories.
For example, the article "Lists_of_Israeli_footballers" is linked to multiple categories, such as :
Lists of association football players by nationality - Israeli footballers - Association football in Israel lists
Now, if you climb back the category tree, you are likely to find a lot of paths climbing up to the "Football" category, but there is also at least one path leading up to "Science" for example.
This is problematic because my final goal is to be able to determinate whether or not an article belongs to a given Category using the list of categories it's linked with : right now a simple ancestor search gives false positives (for example : identifies "Israeli footballers" as part of the "Science" category - which is obviously not the expected result).
I want an algorithm able to find out what the most likely ancestor is.
I thought about two main solutions :
Count the number of distinct paths in the WCG linking article's category vertices to the candidate ancestor category (and use number of paths linking to other categories of same depth for comparison)
Use some kind of clustering algorithm and make ancestor search queries in isolated graph spaces
The issue with those options is that they seem to be very costly considering the size of the WCG (2 million vertices - even more edges). Eventually, I could work with a solution that uses a preprocessing algorithm in O(n) or more to achieve O(1) later, but I need the queries to be overall very fast.
Are there existing solutions to my problem ? Open to all suggestions.
Np, thanks for clarifying. anything like clustering is probably not a good idea, because those type of algorithms are meant to determine a category for an object that is not associated with a category yet. In your problem all objects (footballer article) is already associated to different categories.
You should probably do a complete search through all articles and save the matched categories with each article in a hash table so that you can then retrieve this category information when you need to know this for a new article.
Whether or not a category is relevant for an article seems totally arbitrary to me and seems to be something you should decide for yourself (e.g. determine a threshhold of 5 links to a category before it is determined part of the category).
If you're getting these articles from wikipedia you're probably going to have a pretty long run working through the entire tree, but in my opinion it seems like it's your only choice.
Search with DFS, and each time you find an arcticle-category match save the article in a hashtable (you need to be able to reduce an article to a unique identifier).
This is probably my most vague answer I've ever posted here, and your question might be too broad... if you're not helped with this please let me know so I can consider removing it in order to avoid confusion with future readers.

Given a large tree structure, is there an efficient algorithm to do querying or filtering on the tree?

Let's say I wanted all nodes whose parent(s) matched some certain condition.
Is there an accepted way of doing this other than inspecting each node and building a results object full of either nodes or subtrees?
If the tree is not in already sorted or indexed based on the search condition in some way, then you cannot prune the tree traversal (i.e. you cannot decide to not take the right child at some particular node, for instance). Therefore, you have no choice but to traverse the entire tree.
That's pretty much it. You simply have to access each node to see whether it matches the criteria.
But there are some ways to speed it up:
Use an index. If you are repeatedly querying the same property, it might be beneficial to create an index on that property and use for searching. This could speed up your code immensely. Doing is not free though: you need to calculate the index up front, update it every time you update the tree and you need more memory to keep it.
If you have a multi-core machine, you can process individual subtrees in parallel by using separate threads.

Draw nodes in e.g. a Chord ring

I have a set of nodes that I would like to put into a ring. They all have a numeric property which I would like to use a reference when putting into a ring.
E.g, node with param 32 comes after node with para 22.
What I really need is a library (or something like that) which can make it possible to have the correct "distance" between the nodes, e.g: between 22 and 32 is 10 "units", and between 32 and 35 is 3 "units" where "units" may be an empty numeric slot.
Sounds like you need a sorted list where the end links to the start. I know of no standard implementation, but it would be pretty easy to implement one yourself.
Something like a doubly linked list with the head and tail connected would work. Add operations would have to traverse the list to find the appropriate position to insert into, making insert an O(n) operation. This would make your list perform realtivly poorly, with pretty much all standard list operations being O(n).
You could implement a distanceToNext and/or distanceToPrevious pretty easily by just getting the values of the current and next/previous nodes and returning the difference.
Edit:
Just realised from the question title that you are probably looking for some GUI library to draw these and I just hinted at the model you might use. I'll have a think about the GUI.
Edit 2:
Your problem boils down to how do you draw a polygon when you only know the length of the sides. I asked on the maths stack exchange for you.

Java TreeNode: How to prevent getChildCount from doing expensive operation?

I'm writing a Java Tree in which tree nodes could have children that take a long time to compute (in this case, it's a file system, where there may be network timeouts that prevent getting a list of files from an attached drive).
The problem I'm finding is this:
getChildCount() is called before the user specifically requests opening a particular branch of the tree. I believe this is done so the JTree knows whether to show a + icon next to the node.
An accurate count of children from getChildCount() would need to perform the potentially expensive operation
If I fake the value of getChildCount(), the tree only allocates space for that many child nodes before asking for an enumeration of the children. (If I return '1', I'll only see 1 child listed, despite that there are more)
The enumeration of the children can be expensive and time-consuming, I'm okay with that. But I'm not okay with getChildCount() needing to know the exact number of children.
Any way I can work around this?
Added: The other problem is that if one of the nodes represents a floppy drive (how archaic!), the drive will be polled before the user asks for its files; if there's no disk in the drive, this results in a system error.
Update: Unfortunately, implementing the TreeWillExpand listener isn't the solution. That can allow you to veto an expansion, but the number of nodes shown is still restricted by the value returned by TreeNode.getChildCount().
http://java.sun.com/docs/books/tutorial/uiswing/components/tree.html#data
scroll a little down, there is the exact tutorial on how to create lazy loading nodes for the jtree, complete with examples and documentation
I'm not sure if it's entirely applicable, but I recently worked around problems with a slow tree by pre-computing the answers to methods that would normally require going through the list of children. I only recompute them when children are added or removed or updated. In my case, some of the methods would have had to go recursively down the tree to figure out things like 'how many bytes are stored' for each node.
If you need a lot of access to a particular feature of your data structure that is expensive to compute, it may make sense to pre-compute it.
In the case of TreeNodes, this means that your TreeNodes would have to store their Child count. To explain it a bit more in detail: when you create a node n0 this node has a childcount (cc) of 0. When you add a node n1 as a child of this one, you n1.cc + cc++.
The tricky bit is the remove operation. You have to keep backlinks to parents and go up the hierarchy to subtract the cc of your current node.
In case you just want to have the a hasChildren feature for your nodes or override getChildCount, a boolean might be enough and would not force you to go up the whole hierarchy in case of removal. Or you could remove the backlinks and just say that you lose precision on remove operations. The TreeNode interface actually doesn't force you to provide a remove operation, but you probably want one anyway.
Well, that's the deal. In order to come up with precomputed precise values, you will have to keep backlinks of some sorts. If you don't you'd better call your method hasHadChildren or the more amusing isVirgin.
There are a few parts to the solution:
Like Lorenzo Boccaccia said, use the TreeWillExpandListener
Also, need to call nodesWereInserted on the tree, so the proper number of nodes will be displayed. See this code
I have determined that if you don't know the child count, TreeNode.getChildCount() needs to return at least 1 (it can't return 0)

Categories