I have often read/heard that the composite pattern is a good solution to represent hierarchical data structures like binary trees, which it is great to explain this pattern because internal nodes are composite objects and leaves are leaf objects. I can appreciate that using this pattern, it is easy to visit every element in an uniform way.
However, I am not so sure if it is the best example if you are considering that a tree is filling on demand (every time an insert method is executed) because we have to convert a leaf to composite object many times (e.g. when leaf has to add a child). To convert a leaf object, I imagine a tricky way like (inspired by become: from Smalltalk I guess):
aComposite = aLeaf.becomeComposite();
aComposite.addChild(newElement);
//destroy aLeaf (bad time performance)
To sum up, Is a good example the use of a tree-like structure to illustrate the composite pattern if this kind of structure is born commonly empty and then you have to add/insert elements?
GoF states the Intent of Composite as follows:
"Compose objects into tree structures to represent part-whole hierarchies. ..... treat individual object and compositions of objects uniformly"
So a tree is not so much a structure to illustrate Composite, rather a tree is the structure by which composite is defined and operates. Its also worth remembering that for the purposes of Composite, a tree can be a binary tree (2 children), a linked list (one child) or can be composed of nodes with a variable number of children .
Its quite normal to build a tree from nothing. Consider an arithmetic expression parser, building a composite "parse" tree. The parser will start from nothing and create leaf nodes for terminal symbols (like + - * / braces, numbers) and composite nodes to combine the terminals perform the calculations. The parser constructs the tree such that invoking evaluate() on the head node will cause a traversal to evaluate the expression.
I use this example to show that a tree can be built bottom up, never having to "convert a leaf to composite object".
If your application builds the tree top down, or progressively in stages, its hard to see that matters, because the build process will consist of creating appropriate nodes and inserting them in a way that makes sense for the application.
If converting leaf nodes to composite nodes is problematic in any specific application, then for sure you make to look at ways to minimise the overhead in that situation. but its only a valid Composite structure when the tree is built!
Related
I am trying to use ANTLR to extract information from a PLSQL file. I am using porcelli PLSQL grammar, using which ANTLR spits out AST on my input plsql file. I need to read the returned "CommonTree" class (which represents the AST) and obtain different information - say name of tables and related columns. I was thinking if it would make sense to use the visitor pattern to collect information about tables and related columns on a particular table. For instance, a query like this
SELECT s.name from students s, departments d WHERE d.did=10 and s.sid=d.did
will be shown in AST as
Obtaining table name and related columns here will involve capturing aliases first from the FROM element and then matching with columns used in SELECT_LIST. Information about tables and columns is hidden deep in leaf nodes under repeatedly used elements such as "ANY_ELEMENT".
So, How to go about using a visitor pattern here? Would I end up with way too many visitors because there are potentially a lot of element types? Is Visitor pattern relevant here?
EDIT
After thinking over it for a while, I am nearing a conclusion that Visitor pattern wouldn't make sense in this scenario. Given the fact that the data structure that needs to be visited is a tree and there are potentially so many node types (select, update, insert, delete, from, where, into..), defining what should happen on visiting each of these node types for any given visitor could result in hundreds of methods per visitor class!
As updated in my last edit, I resolved this by not implementing a visitor pattern because such a pattern would require me to create all node types and for PL/SQL there will be too many.
This is more actually more of a Lucene question, but it's in the context of a neo4j database.
I have a database that's divided into 50 or so node types (so "collections" or "tables" in other types of dbs). Each has a subset of properties that need to be indexed, some share the same name, some don't.
When searching, I always want to find nodes of a specific type, never across all nodes.
I can see three ways of organizing this:
One index per type, properties map naturally to index fields: index 'foo', 'id'='1234'.
A single global index, each field maps to a property name, to distinguish the type either include it as part of the value ('id'='foo:1234') or check the nodes once they're returned (I expect duplicates to be very rare).
A single index, type is part of the field name: 'foo.id'='1234'.
Once created, the database is read-only.
Are there any benefits to one of those, in terms of convenience, size/cache efficiency, or performance?
As I understand it, for the first option neo4j will create a separate physical index for each type, which seems suboptimal. For the third, I end up with most lucene docs only having a small subset of the fields, not sure if that affects anything.
I came across this problem recently when I was building an ActiveRecord connection adapter for Neo4j over REST, to be used in a Rails project. Since ActiveRecord and ActiveRelation, both, have a tight coupling with SQL syntaxes, it became difficult to fit everything into NoSQL. Might not be the best solution, but here's how I solved it:
Created an index named model_index which indexed nodes under two keys, type and model
Index lookup with type key currently happens with just one value model. This was introduced primarily to achieve a SHOW TABLES SQL functionality which can get me a list of all models present in the graph.
Index lookup with model key takes place with values corresponding to different model names in my system. This is primarily for achieving DESC <TABLENAME> functionality.
With each table creation as in CREATE TABLE, a node is created with table definition attributes being stored in node properties.
Created node is indexed under model_index with type:model and model:<model-name>. This enables the newly created model in the list of 'tables' and also allows one to directly reach the model node by an index lookup with model key.
For each record created per model (type in your case), an outgoing edge is created labeled instances directed from model node to this new record. v[123] :=> [instances] :=> v[245] where v[123] represents model node and v[245] represents a record of v[123]'s type.
Now if you want to get all instances of a specified type, you could lookup the model_index with model:<model-name> to reach a model node and then fetch all adjacent nodes over an outgoing edge labeled instances. Filtered lookups can be further achieved by applying filters and other complex traversals.
The above solution prevents model_index from clogging since it contains 2x and achieves an effective record lookup via one index lookup and single-level traversal.
Although in your case, nodes of different types are not adjacent to each other, even if you wanted to do so, you could determine the type of any arbitrary node by simply looking up it's adjacent node with an incoming edge labeled instances. Further, I'm considering the incorporate SpringDataGraph's pattern of storing a __type__ property on each instance node to avoid this adjacent node lookup.
I'm currently translating AREL to Gremlin scripts for almost everything. You could find the source code for my AR Adapter at https://github.com/yournextleap/activerecord-neo4j-adapter
Hope this helps, Cheers! :)
A single index will be smaller than several little indexes, because some data, such as the term dictionary, will be shared. However, since a term dictionary lookup is a O(lg(n)) operation, a lookup in a bigger term dictionary might be a little slower. (If you have 50 indexes, this would only require 6 (2^6>=50) more comparisons, it is likely you won't notice any difference.)
Another advantage of a smaller index is that the OS cache is likely to make queries run faster.
Instead of your options 2 and 3, I would index two different fields id and type and search for (id:ID AND type:TYPE) but I don't know if it is possible with neo4j.
spring-data-neo4j is using the first approach - it creates a different index for each type. So I guess that's a good option for the general scenario. But in your particular case it might be suboptimal, as you say. I'd run some benchmarks to measure the performance.
The other two, by the way, seem a bit artificial. You are possibly indexing completely unrelated information in the same index, which doesn't sound right.
How can I use Guava Multimap to represent the following xml:
<node key="abc123">some value</node>
<node key="asdf22">
<node key="as234">some value343</node>
<node key="sggg234">some value234234</node>
</node>
In my xml files, 90% of the time a given node element will not have inner nodes (in case I can optimize for that?).
What I wanted was a KeyValuePair collection, where the value is another collection of keyValuePairs.
In c# I could do:
public List<string,List<KeyValuePair<string,string>>> nodes;
I was told to look at Guava's multimap, but not sure how to use it correctly, can someone help me out?
BTW, since 90% of the cases I don't need the value to be another List, could I somehow optimize for this situation?
Not sure the MultiMap is a best fit for what you've described here. A multimap is where you have multiple values for the same key. That's not what your XML describes. Your XML describes a hierarchy where each value has a path, possibly with multiple keys concatenated to describe it's location within the hierarchy.
From a modeling standpoint, I'd consider going with 2 types of XML nodes. 1 holds values, and the other holds nodes (think Files vs Directories). You'll need to switch up your XML though to make a clear delineation between the two different types. It'll be easier to parse that way.
So, in rough psuedo code..
class Container {
private Map<String,String> keyValuePairs;
private Map<String,Container> children;
}
This gives you infinite depth. Each container can hold keyValue pairs as well as other containers. Top level node should be the root container. Recursion should be trivial. Traversal by key/separator should also be easy.
Possibly more flexible than you need. Trim as necessary.
You could also merge the concept of the values/containers, but then your modeling gets a bit uglier. It's a tradeoff.
I am implementing a schema matching algorithm.I need to perform schema structure matching, i need to represent schema as a is-a has-a relationship graph....one graph per schema...
each node in relation model will represent a table with is-a as table and one has-a relationship for each column(having there own is-a).
My question is how to implement this in best way using java, comparing graphs will be pseudo polynomial in graph size and may through out of memory error if we pull complete schema..i want to find nodes with similar relationships in both graphs ( this will lead to DFS)
is there any already existing java implementation that can do this, i already explored jgraphT, jung...not sure which one will be best to do this..please help
thanks in advance.!!
Whatever graph API you use ought to allow you to do something like this:
boolean equal = graph1.equals(graph2);
where that evaluates true if the nodesets and edgesets are equal. The nodes would need IDs or else content so you could establish actual equality as opposed to graph isomorphism.
Is that what you are asking?
From the Java API, it seems that nodes of a JTree are instances of TreeNode. In addition, all TreePaths returned seems to be instances of TreeNode. So, why are TreePaths represented by Object[] instead of TreeNode[]? This give raise to inconvenience when using these classes.
Thanks.
See this explanation from the Java tutorial:
Interestingly, the TreeModel interface accepts any kind of object as a tree node. It does not require that nodes be represented by DefaultMutableTreeNode objects, or even that nodes implement the TreeNode interface. Thus, if the TreeNode interface is not suitable for your tree model, feel free to devise your own representation for tree nodes. For example, if you have a pre-existing hierarchical data structure, you do not need to duplicate it or force it into the TreeNode mold. You just need to implement your tree model so that it uses the information in the existing data structure.