I have a org.dom4j.Document instance that is a DefaultDocument implementation to be specific. I would like to insert a new node just before an other one. I do not really understand the dom4j api, I am confused of the differences between Element and DOMElement and stuff.
org.dom4j.dom.DOMElement.insertBefore is not working for me because the Node I have is not a DOMElement. DOMNodeHelper.insertBefore is nor good because I have org.dom4j.Node instances and not org.w3c.dom.Node instances. OMG.
Could you give me a little code snippet that does this job for me?
This is what I have now:
// puts lr's to the very end in the xml, but I'd like to put them before 'e'
for(Element lr : loopResult) {
e.getParent().add(lr);
}
It's an "old" question, but the answer may still be relevant. One problem with the DOM4J API is that there are too many ways to do the same thing; too many convenience methods with the effect that you cannot see the forest for the trees. In your case, you should get a List of child elements and insert your element at the desired position: Something like this (untested):
// get a list of e's sibling elements, including e
List elements = e.getParent().elements();
// insert new element at e' position, i.e. before e
elements.add(elements.indexOf(e), lr);
Lists in DOM4J are live lists, i.e. a mutating list operation affects the document tree and vice versa
As a side note, DOMElement and all the other classes in org.dom4j.dom is a DOM4J implementation that also supports the w3c DOM API. This is rarely needed (I would not have put it and a bunch of the other "esoteric" packges like bean, datatype, jaxb, swing etc, in the same distribution unit). Concentrate on the core org.dom4j, org.dom4j.tree, org.dom4j.io and org.dom4j.xpathpackages.
Related
I would like to understand why I have to do the following when I want to access a specific element value during XML parsing:
NodeList controlList = poDoc.getElementsByTagName("control");
Node controlNode = controlList.item(0);
Element controlElem = (Element) controlNode;
usageType = controlElem.getElementsByTagName("usage_type").item(0).getFirstChild().getNodeValue();
Here I have to cast the controlNode to (Element), only because I want to access another element deeper down the DOM tree. This is all working as expected, I just would like to understand why this is the way it is. Why can't there be a getElementByTagName or similar call for the Node-object? Or is there and I just don't know it. Since I'm quite new to Java, this might be the case. There surely is a better reason to this than "because that's the way the interface was implemented".
Only documents and elements can contain elements.
So the designers of the DOM API simpy decided to define the method
getElementsByTagName only in the Node classes Document and Element.
An alternative design would have been to define getElementsByTagName in the Node class and return an empty node list if the node cannot contain elements. (This is roughly the design decision made by the XPath spec).
By XML standards, every entity in an XML document is a Node, and not everything in an XML document can have child elements. The parser can't know if a referenced Node is a header, an element, or even a comment, so it would be unwise to have such a method without first checking its type.
Even if you expect your XML to be formatted a certain way, it's typical to check if a Node is actually an Element, for example:
if(node instanceof Element) {
NodeList usagetypes = ((Element)node).getElementsByTagName("usage_types");
...
According to the javadoc, Node is any piece of data that can exist in an XML doc, including comments, headers, and text (text value of XML element) so not all kind of nodes can have have a "name" or have child elements.
an Element defines the kinds of node that may have child elements that may be retrieved by a name.
Let me ask this question taking Java code mentioned in the query link.
In the same query link, a supplementary answer says: "In general, we need to separate List and Node classes, your List should have an head and your Node will have item and next pointer."
In addition, my class lecture says, there are two reasons for separating List and Node class.
Reason 1:
Assume user X and user Y are pointing to the first List_Node in that list
After adding soap item by user X in the shopping list, below is the situation,
So, User Y is inconsistent with this situation.
Reason 2:
Handling empty list.
user X pointing to an empty list, that mean X is null.
X.nth(1); //Null pointer exception
My question:
Reason_1 could have been handled by inserting new node after last node. Reason_2 could have been handled, as part of error check in the code.
So, Why exactly we need to separate Node and List class?
Note: Java code has item of type intinstead of type Object that can accommodate strings. I did not want to change this code again.
Reason_1 could have been handled by inserting new node after last node.
But that is changing the problem. Lists are ordered. Adding an element before an existing element or after an existing element are different operations. The data structure must be able to handle both operations, otherwise it is not a proper list.
Reason_2 could have been handled, as part of error check in the code.
That wouldn't work. The code of your list abstraction can't handle the NPE. If you attempt to call x.nth(1) and x is null, the exception is thrown before you get into any of the code that implements the list. Therefore, the (hypothetical) error handling in the list code cannot be executed. (Java exception handling doesn't work like that ...)
And as you correctly point out in your comment, forcing code that uses a list to handle empty lists as a special case would be bad list API design.
In short, both of the reasons stated are valid. (IMO)
Here are some very good reasons:
Separate implementation from interface. Perhaps in the future someone will find a new perfectly good implementation of your life involving a row of carrier pigeons moving elements around. Should your client code have to update everything with methods flapWings and rewardPigeon instead of manipulating nodes? No! (More realistically, Java offers ArrayList and LinkedList IIRC).
It makes more sense. list.reverse() makes sense. node.reverse()... what? Does this change every other node recursively or not?
Speaking of that reverse method, if you implement it right now you can implement it in O(N). You know if you have an int orientation; that is 1 or -1, you can implement it in O(1)? But all subsequent operations need to use that bit, so it's a meta-node operation, not a node operation.
Empty lists are possible. Empty nodes don't make sense. This is your (2). Imagine this: I am a client of your software. You have list == node. A rival vendor offers separation. You say "oh it works fine just add an error check in your code." Your rival does not. Who do I buy from? This is a thought experiment meant to convince you these really are different, and the former does have a defect.
Problem is I don't understand how to create a tree. I have gone through many code examples on trees but I don't even know how to work with/handle a node and hence I don't understand how the node class works(that was present in all the program examples ). When i try to use methods such as appendChild(as mentioned in java docs),I get an error,and I am asked to create one such appendChild method inside that node class within the main program. Couldn't understand why that happened.
I am given integer pairs((u,v) meaning there is an edge between u & v) of nodes and I also need to know if any Element-to-node conversion is required for using u and v(of type integer) as nodes.
Please bear with me since my basics are weak. Little explanation on how the entire thing works/functions would be very helpful.
Thank you.
EDIT 1: I went through the following links:(hardly found anything on just unordered trees) http://www.cs.cmu.edu/~adamchik/15-121/lectures/Trees/code/BST.java http://www.newthinktank.com/2013/03/binary-tree-in-java/ . Tried to modify these codes to meet my own purpose but failed.
I only got a blurry idea and that is not enough for implementation. I am trying to make a simple unordered tree,for which i am given u v pairs like:
(4,5) (5,7) (5,6). I just need to join (4<--5),(5<--7) and (5<--6). So how do I write a node class that only joins one node to the prev node? Besides,to do only this,do I need to bother myself with leftchild,rightchild? If not,how will I be able to traverse the tree and do similar operations such as height diameter calculation etc later?
Thank you for your patience.
Well Its not entirely clear whether you want an explanation on tree creation in general, on some tree implementation you have found, or you have a basic code already that you cannot get working. You might want to clarify that :).
Also tree creation in general:
Most explanation and implementation you will find might be "overly" elegant :). So try to imagine a simple linked list first. In that list you have nodes(the elements of the list). A node contains some valuable data and a reference to an other node object. A very simple tree is different only in that a node have more than one reference to other nodes. For example it has a Set of references, its "children".
Here is a VERY crude code only as an example for addChild() or appendChild in your case:
public class Node {
private String valueableData;
private Set<Node> children;
public Node(){
this.children=new HashSet<Node>();
}
public Node (String valueableData){
this.valueableData=valueableData;
this.children=new HashSet<Node>();
}
public void addChild(Node node){
children.add(node);
}
}
Now this implementation would be quite horrible (I even deleted the setter/getters), also it would be wise to keep the root nodes reference in some cases, and so on. But you might get the basic idea.
You might wanna create a cycle or recursion to go over the (u,v) integer pairs. You create the root node first then you just addChild() all the other nodes recursively, or create every node first then setChild() them according to your rules.
I am working on a problem where i'm required to store elements with requirements of No Duplication and Maintaining order. I chose to go with LinkedHashSet Since it fulfilled both my requirements.
Let's say I have this code:
LinkedHashSet hs = new LinkedHashSet();
hs.add("B");
hs.add("A");
hs.add("D");
hs.add("E");
hs.add("C");
hs.add("F");
if(hs.contains("D")){
//do something to remove elements added after"D" i-e remove "E", "C" and "F"
//maybe hs.removeAll(Collection<?>c) ??
}
Can anyone please guide me with the logic to remove these elements?
Am I using the wrong datastructure? If so, then what would be a better alternative?
I think you may need to use an iterator to do the removal if you are using a LinkedHashSet. That is to say find the element, then keep removing until you get to the tail. This will be O(n), but even if you wrote your own LinkedHashSet (with a doubly linked list and hashset) you would have access to the raw linking structure so that you could cut the linked list in O(1), but you would still need to remove all elements that you just cut from the linked list from the HashSet which is where the O(n) cost would arise again.
So in summary, remove the element, then keep an iterator to that element and continue to walk down removing elements until you get to the end. I'm not sure if LinkedHashSet exposes the required calls, but you can probably figure that out.
You could write your own version of an ArrayList that doesn't allow for duplicates, by overriding add() and addAll(). To my knowledge, there is no "common" 3rd party version of such, which has always surprised me. Anybody know of one?
Then the remove code is pretty simple (no need to use an ListIterator)
int idx = this.indexOf("D");
if (idx >= 0) {
for (int goInReverse = this.size()-1; goInReverse > idx; goInReverse--)
this.remove(goInReverse);
}
However, this is still O(N), cause you loop through every element of the List.
The basic problem here is that you have to maintain two data structures, a "map" one representing the key / value mapping, and a "list" other representing the insertion order.
There are "map" and "list" organizations that offer fast removal of a elements after a given point; e.g. ordered trees of various kinds and both array and chain-based lists (modulo the cost of locating the point.)
However, it seems impossible to remove N elements from the two data structures in better than O(N). You have to visit all of the elements being removed to remove them from the 2nd data structure. (In fact, I suspect one could prove this mathematically ...)
In short, there is no data structure that has better complexity than what you are currently using.
The area where it is possible to improve performance (with a custom collection class!) is in avoiding an explicit use of an iterator. Using an iterator and the standard iterator API, the cost is O(N) on the total number of elements in the data structure. You could make this O(N) on the number of elements removed ... if the hash entry nodes also had next/prev links for the sequence.
So, after trying a couple of things mentioned above, I chose to implement a different Data structure. Since I did not have any issue with the O(n) for this problem (as my data is very small)
I used Graphs, this library came in really handy: http://jgrapht.org/
What I am doing is adding all elements as vertices to a DirectedGraph also creating edges between them (edges helped me solve another non-related problem as well). And when it's time to remove the elements I use a recursive function with the following pseudo code:
removeElements(element) {
tempEdge = graph.getOutgoingEdgeFrom(element)
if(tempEdge !=null)
return;
tempVertex = graph.getTargetVertex(tempEdge)
removeElements(tempVertex)
graph.remove(tempVertex)
}
I agree that graph DS is not good for these kind of problems, but under my conditions, this works perfectly... Cheers!
I'm creating an API that encapsulates JPA objects with additional properties and helpers. I do not want the users to access the database, because I have to provide certain querying functionality for the consumers of the API.
I have the following:
Node1(w/ attributes) -- > Edge1(w/ attr.) -- > Node2(w/ attr.)
and
Node1(w/ attributes) -- > |
Node2(w/ attributes) -- > | -- > HyperEdge1(w/ attr.)
Node3(w/ attributes) -- > |
Basically a Node can be of a certain type, which would dictate the kind of attributes available. So I need to be able to query these "paths" depending on different types and attributes.
For example: Start from a Node, and find a path typeA > typeB & attr1 > typeC.
So I need to do something simple, and be able to write the query as a string, or maybe a builder pattern style.
What I have so far, is a visitor pattern set up to traverse the Nodes/Edges/HyperEdges, and this allows for a sort of querying, but it's not very simple, since you have to create a new visitor for new types of queries.
This is my implementation so far:
ConditionImpl hasMass = ConditionFactory.createHasMass( 2.5 );
ConditionImpl noAttributes = ConditionFactory.createNoAttributes();
List<ConditionImpl> conditions = new ArrayList<ConditionImpl>();
conditions.add( hasMass );
conditions.add( noAttributes );
ConditionVisitor conditionVisitor = new ConditionVisitor( conditions );
node.accept( conditionVisitor );
List<Set<Node>> validPaths = conditionVisitor.getValidPaths();
The code above, does a query that checks if the starting node has a mass of 2.5 and a linked node (child) has no attributes. The visitor does a condition.check( Node ) and returns a boolean.
Where do I start with creating a querying language for a graph that is simpler?
Note: I do not have the option of using an existing graph library and I will have hundreds of thousands of nodes, plus the edges..
Personally, I like the idea of the visitor pattern, however it might turn out to expensive to visit all nodes.
Query Interface: If users / other developers are using it, I would use a builder style interface, with readable method names:
Visitor v = QueryBuilder
.selectNodes(ConditionFactory.hasMass(2.5))
.withChildren(ConditionFactory.noAttributes())
.buildVisitor();
node.accept(v);
List<Set<Node>> validPaths = v.getValidPaths();
As pointed out above, this is more or less just syntactic sugar for what you already have (but sugar makes all the difference). I would separate the code for "moving on the graph" (like "check whether visited node fulfills condition" or "check whether connected nodes fulfill condition") from the code that actually checks (or is) a condition. Also, use composites on conditions to build and/or:
// Select nodes with mass 2.5, follow edges with both conditions fulfilled and check that the children on these edges have no attributes.
Visitor v = QueryBuilder
.selectNodes(ConditionFactory.hasMass(2.5))
.withEdges(ConditionFactory.and(ConditionFactory.freestyle("att1 > 12"), ConditionFactory.freestyle("att2 > 23"))
.withChildren(ConditionFactory.noAttributes())
.buildVisitor();
(I used "freestyle" because of missing creativity right now, but the intention of it should be clear) Node that in general this might be two different interfaces in order to not build strange queries.
public interface QueryBuilder {
QuerySelector selectNodes(Condition c);
QuerySelector allNodes();
}
public interface QuerySelector {
QuerySelector withEdges(Condition c);
QuerySelector withChildren(Condition c);
QuerySelector withHyperChildren(Condition c);
// ...
QuerySelector and(QuerySelector... selectors);
QuerySelector or(QuerySelector... selectors);
Visitor buildVisitor();
}
Using this kind of syntactic sugar makes the queries readable from the source code without forcing you to implement your own data query language. The QuerySelector implementations would than be responsible for "moving" around the visited nodes whereas the Conditition implementation would check whether the condition match.
The clear downside of this approach is, that you need to foresee most of the queries in interfaces and need to implement them already.
Scalability with number of nodes: You might need to add some kind of index to speed up finding "interesting" nodes. One idea which is popping up is to add (for each index) a layer to the graph in which each nodes models one of the different attribute settings for the "indexed variable". The normal edges could then connect these index nodes with the nodes in the original graph. The hyper edges on the index could then build a network which is smaller to search on. Of course there is still the boring way of storing the index in a map-like structure with a attributeValue -> node mapping. Which probably is much more performant than the idea above anyway.
If you have some kind of Index make sure that the index can as well receive a visitor such that it does not have to visit all nodes in the graph.
It sounds like you have all the pieces except some syntactic sugar.
How about an immutable style where you create the whole list above like
Visitor v = Visitor.empty
.hasMass(2.5)
.edge()
.node()
.hasNoAttributes();
You can create any kind of linear query pattern using this style; and if you add a some extra state you could even do branching queries by e.g. setName("A") and later .node("A") to return to that point of the query.