How can I use Guava Multimap to represent the following xml:
<node key="abc123">some value</node>
<node key="asdf22">
<node key="as234">some value343</node>
<node key="sggg234">some value234234</node>
</node>
In my xml files, 90% of the time a given node element will not have inner nodes (in case I can optimize for that?).
What I wanted was a KeyValuePair collection, where the value is another collection of keyValuePairs.
In c# I could do:
public List<string,List<KeyValuePair<string,string>>> nodes;
I was told to look at Guava's multimap, but not sure how to use it correctly, can someone help me out?
BTW, since 90% of the cases I don't need the value to be another List, could I somehow optimize for this situation?
Not sure the MultiMap is a best fit for what you've described here. A multimap is where you have multiple values for the same key. That's not what your XML describes. Your XML describes a hierarchy where each value has a path, possibly with multiple keys concatenated to describe it's location within the hierarchy.
From a modeling standpoint, I'd consider going with 2 types of XML nodes. 1 holds values, and the other holds nodes (think Files vs Directories). You'll need to switch up your XML though to make a clear delineation between the two different types. It'll be easier to parse that way.
So, in rough psuedo code..
class Container {
private Map<String,String> keyValuePairs;
private Map<String,Container> children;
}
This gives you infinite depth. Each container can hold keyValue pairs as well as other containers. Top level node should be the root container. Recursion should be trivial. Traversal by key/separator should also be easy.
Possibly more flexible than you need. Trim as necessary.
You could also merge the concept of the values/containers, but then your modeling gets a bit uglier. It's a tradeoff.
Related
I have often read/heard that the composite pattern is a good solution to represent hierarchical data structures like binary trees, which it is great to explain this pattern because internal nodes are composite objects and leaves are leaf objects. I can appreciate that using this pattern, it is easy to visit every element in an uniform way.
However, I am not so sure if it is the best example if you are considering that a tree is filling on demand (every time an insert method is executed) because we have to convert a leaf to composite object many times (e.g. when leaf has to add a child). To convert a leaf object, I imagine a tricky way like (inspired by become: from Smalltalk I guess):
aComposite = aLeaf.becomeComposite();
aComposite.addChild(newElement);
//destroy aLeaf (bad time performance)
To sum up, Is a good example the use of a tree-like structure to illustrate the composite pattern if this kind of structure is born commonly empty and then you have to add/insert elements?
GoF states the Intent of Composite as follows:
"Compose objects into tree structures to represent part-whole hierarchies. ..... treat individual object and compositions of objects uniformly"
So a tree is not so much a structure to illustrate Composite, rather a tree is the structure by which composite is defined and operates. Its also worth remembering that for the purposes of Composite, a tree can be a binary tree (2 children), a linked list (one child) or can be composed of nodes with a variable number of children .
Its quite normal to build a tree from nothing. Consider an arithmetic expression parser, building a composite "parse" tree. The parser will start from nothing and create leaf nodes for terminal symbols (like + - * / braces, numbers) and composite nodes to combine the terminals perform the calculations. The parser constructs the tree such that invoking evaluate() on the head node will cause a traversal to evaluate the expression.
I use this example to show that a tree can be built bottom up, never having to "convert a leaf to composite object".
If your application builds the tree top down, or progressively in stages, its hard to see that matters, because the build process will consist of creating appropriate nodes and inserting them in a way that makes sense for the application.
If converting leaf nodes to composite nodes is problematic in any specific application, then for sure you make to look at ways to minimise the overhead in that situation. but its only a valid Composite structure when the tree is built!
I'm reviewing the capabilities of Googles Guava API and I ran into a data structure that I haven't seen used in my 'real world programming' experience, namely, the BiMap. Is the only benefit of this construct the ability to quickly retrieve a key, for a given value? Are there any problems where the solution is best expressed using a BiMap?
Any time you want to be able to do a reverse lookup without having to populate two maps. For instance a phone directory where you would like to lookup the phone number by name, but would also like to do a reverse lookup to get the name from the number.
Louis mentioned the memory savings possible in a BiMap implementation. That's the only thing that you can't get by wrapping two Map instances. Still, if you let us wrap the Map instances for you, we can take care of a few edges cases. (You could handle all these yourself, but why bother? :))
If you call put(newKey, existingValue), we'll error out immediately to keep the two maps in sync, rather than adding the entry to one map before realizing that it conflicts with an existing mapping in the other. (We provide forcePut if you do want to override the existing value.) We provide similar safeguards for inserting null or other invalid values.
BiMap views keep the two maps in sync: If you remove an element from the entrySet of the original BiMap, its corresponding entry is also removed from the inverse. We do the same kind of thing in Entry.setValue.
We handle serialization: A BiMap and its inverse stay "connected," and the entries are serialized only once.
We provide a smart implementation of inverse() so that foo.inverse().inverse() returns foo, rather than a wrapper of a wrapper.
We override values() to return a Set. This set is identical to what you'd get from inverse().keySet() except that it maintains the same iteration order as the original BiMap.
This is more actually more of a Lucene question, but it's in the context of a neo4j database.
I have a database that's divided into 50 or so node types (so "collections" or "tables" in other types of dbs). Each has a subset of properties that need to be indexed, some share the same name, some don't.
When searching, I always want to find nodes of a specific type, never across all nodes.
I can see three ways of organizing this:
One index per type, properties map naturally to index fields: index 'foo', 'id'='1234'.
A single global index, each field maps to a property name, to distinguish the type either include it as part of the value ('id'='foo:1234') or check the nodes once they're returned (I expect duplicates to be very rare).
A single index, type is part of the field name: 'foo.id'='1234'.
Once created, the database is read-only.
Are there any benefits to one of those, in terms of convenience, size/cache efficiency, or performance?
As I understand it, for the first option neo4j will create a separate physical index for each type, which seems suboptimal. For the third, I end up with most lucene docs only having a small subset of the fields, not sure if that affects anything.
I came across this problem recently when I was building an ActiveRecord connection adapter for Neo4j over REST, to be used in a Rails project. Since ActiveRecord and ActiveRelation, both, have a tight coupling with SQL syntaxes, it became difficult to fit everything into NoSQL. Might not be the best solution, but here's how I solved it:
Created an index named model_index which indexed nodes under two keys, type and model
Index lookup with type key currently happens with just one value model. This was introduced primarily to achieve a SHOW TABLES SQL functionality which can get me a list of all models present in the graph.
Index lookup with model key takes place with values corresponding to different model names in my system. This is primarily for achieving DESC <TABLENAME> functionality.
With each table creation as in CREATE TABLE, a node is created with table definition attributes being stored in node properties.
Created node is indexed under model_index with type:model and model:<model-name>. This enables the newly created model in the list of 'tables' and also allows one to directly reach the model node by an index lookup with model key.
For each record created per model (type in your case), an outgoing edge is created labeled instances directed from model node to this new record. v[123] :=> [instances] :=> v[245] where v[123] represents model node and v[245] represents a record of v[123]'s type.
Now if you want to get all instances of a specified type, you could lookup the model_index with model:<model-name> to reach a model node and then fetch all adjacent nodes over an outgoing edge labeled instances. Filtered lookups can be further achieved by applying filters and other complex traversals.
The above solution prevents model_index from clogging since it contains 2x and achieves an effective record lookup via one index lookup and single-level traversal.
Although in your case, nodes of different types are not adjacent to each other, even if you wanted to do so, you could determine the type of any arbitrary node by simply looking up it's adjacent node with an incoming edge labeled instances. Further, I'm considering the incorporate SpringDataGraph's pattern of storing a __type__ property on each instance node to avoid this adjacent node lookup.
I'm currently translating AREL to Gremlin scripts for almost everything. You could find the source code for my AR Adapter at https://github.com/yournextleap/activerecord-neo4j-adapter
Hope this helps, Cheers! :)
A single index will be smaller than several little indexes, because some data, such as the term dictionary, will be shared. However, since a term dictionary lookup is a O(lg(n)) operation, a lookup in a bigger term dictionary might be a little slower. (If you have 50 indexes, this would only require 6 (2^6>=50) more comparisons, it is likely you won't notice any difference.)
Another advantage of a smaller index is that the OS cache is likely to make queries run faster.
Instead of your options 2 and 3, I would index two different fields id and type and search for (id:ID AND type:TYPE) but I don't know if it is possible with neo4j.
spring-data-neo4j is using the first approach - it creates a different index for each type. So I guess that's a good option for the general scenario. But in your particular case it might be suboptimal, as you say. I'd run some benchmarks to measure the performance.
The other two, by the way, seem a bit artificial. You are possibly indexing completely unrelated information in the same index, which doesn't sound right.
In the xml, which way is better when define data in content or in the attribute ? Thanks
<order id='' orderBy='' orderTime=''>
.....
</order>
or
<order id=''>
.....
<orderBy>.. </orderBy>
<orderTime>...</orderTime>
</order>
Depends what the data will be. If the data are just simple strings then the first one will be fine.
The second one is important to use if you need to in the future wrap in CDATA. So if the data is long or may contain html in the future then I suggest using the second form. If you look at twitter's XML feed, you will notice that they prefer using an element for almost all of their properties. Using elements give you flexibility in the future if you need to add more multiple elements of the same type.
Here is an article you should read that discusses this topic.
My advice is to perhaps keep your id in an attribute but the rest in an element.
In general, I prefer to use elements instead of attributes, as elements can later be expanded upon (the "extensible" part of XML). However, when you have a rather specific data type as with both of your examples here, you're probably best off keeping them as attributes - especially if orderTime is a standard XML dateTime type.
I would like to add one more point, if CDATA is not issue, then I prefer as attribute than Element. Order Id, date, count etc., should be attributes. Order Items should be Elements.
Generally speaking, you use an attribute if:
there is only one item (of that name) within the parent node
Otherwise, you use an element if:
there can be more than one item (of that name) within the node, or
the item is complex, i.e., may contain attibutes or sub-elements itself
I know this isn't really what XPath is for but if I have a HashMap of XPath expressions to values how would I go about building an XML document. I've found dom-4j's
DocumentHelper.makeElement(branch, xpath) except it is incapable of creating attributes or indexing. Surely a library exists that can do this?
Map xMap = new HashMap();
xMap.put("root/entity/#att", "fooattrib");
xMap.put("root/array[0]/ele/#att", "barattrib");
xMap.put("root/array[0]/ele", "barelement");
xMap.put("root/array[1]/ele", "zoobelement");
would result in:
<root>
<entity att="fooattrib"/>
<array><ele att="barattrib">barelement</ele></array>
<array><ele>zoobelement</ele></array>
</root>
I looked for something similar a few years ago - a sort of writeable XPath. In the end, having not found anything, I hacked up something which essentially built up the XML document by adding new nodes to parent expressions:
parent="/" element="root"
parent="/root" element="entity"
parent="/root/entity" attribute="att" value="fooattrib"
parent="/root" element="array"
parent="/root" element="ele" text="barelement"
(This was itself to be governed by an XML configuration file, hence the appearance of above.)
It would be tempting to try an automate some of this to just take the last path element, and make something of it, but I always felt that there were XPath expressions I could write which such a dumbheaded approach would get wrong.
Another approach I considered, though did not implement (the above was "good enough"), was to use the excellent Jaxen to generate elements that did not exist, on the fly if it didn't already exist.
From the Jaxen FAQ:
The only thing required is an implementation of the interface org.jaxen.Navigator. Not all of the interface is required, and a default implementation, in the form of org.jaxen.DefaultNavigator is also provided.
The DOMWriterNavigator would wrap and existing DOMNavigator, and then use the makeElement method if the element didn't exist. However, even with this approach,
you'd probably have to do some pre/post processing of the XPath query for things like attributes and text() functions.
The best I was able to come up with is to use a JAXB implementation, which will marshall/unmarshal objects to xml and then I used Dozer (http://dozer.sourceforge.net/documentation/mapbackedproperty.html) to map the xpaths which were keys in a map to the JAXB object method setters.
<mapping type="one-way" map-id="TC1">
<class-a>java.util.Map</class-a>
<class-b>org.example.Foo</class-b>
<field>
<a key="root/entity/#att">this</a>
<b>Foo.entity.att</b>
<a-hint>java.lang.String</a-hint>
</field>
It's more of a two step solution, but really worked for me.
I also wanted same kind of requirement where nature is so dynamic and dont want to use XSLT or any object mapping frameworks, so i've implemented this code in java and written blog on it please visit,
http://ganesh-kandisa.blogspot.com/2013/08/dynamic-xml-transformation-in-java.html
or fork code at git repository,
https://github.com/TheGanesh/DynamicXMLTransformer