Java DOM: How to get how many child elements - java

I have an XML Document:
<entities xmlns="urn:yahoo:cap">
<entity score="0.988">
<text end="4" endchar="4" start="0" startchar="0">Messi</text>
<wiki_url>http://en.wikipedia.com/wiki/Lionel_Messi</wiki_url>
<types>
<type region="us">/person</type>
</types>
</entity>
</entities>
I have a TreeMap<String,String> data which stores the getTextContent() for both the "text" and "wiki_url" element. Some "entity"s will only have the "text" element (no "wiki_url") so i need a way of finding out when there is only the text element as the child and when there is a "wiki_url". I could use document.getElementByTag("text") & document.getElementByTag("wiki_url") but then I would lose the relationship between the text and the url.
I'm trying to get the amount of elements within the "entity" element by using:
NodeList entities = document.getElementsByTagName("entity"); //List of all the entity nodes
int nchild; //Number of children
System.out.println("Number of entities: "+ entities.getLength()); //Prints 1 as expected
nchild=entities.item(0).getChildNodes().getLength(); //Returns 7
However as shows above this returns 7 (which I don't understand, surely its 3 or 4 if you include the grandchild)
I was then going to use the number of children to cycle through them all to check if getNodeName().equals("wiki_url") and save it to data if correct.
Why is it that i am getting the number of children as 7 when I can only count 3 children and 1 grandchild?

The white-spaces following > of <entity score="0.988"> also count for nodes, similarly end of line chararcter between the tags are also parsed to nodes. If you are interested in a particular node with a name, add a helper method like below and call wherever you want.
Node getChild(final NodeList list, final String name)
{
for (int i = 0; i < list.getLength(); i++)
{
final Node node = list.item(i);
if (name.equals(node.getNodeName()))
{
return node;
}
}
return null;
}
and call
final NodeList childNodes = entities.item(0).getChildNodes();
final Node textNode = getChild(childNodes, "text");
final Node wikiUrlNode = getChild(childNodes, "wiki_url");
Normally when working with DOM, comeup with helper methods like above to simplify main processing logic.

Related

How to determine the class type of a sub-class in Java

I have an XML file with the following elements.
<productType>
<productTypeX />
<!-- One of the following elements are also possible:
<productTypeY />
<productTypeZ />
-->
</productType>
So, the XML could also look like this:
<productType>
<productTypeZ />
</productType>
The XML is unmarshalled to a POJO by using JAXB.
How can I determine if the child of <productType> is X, Y or Z? Either in the mapped POJO or directly in the XML?
Now there is a way maybe not cheaper than checking by hand - writing if for every GETTER about sub-classes(null == obj.getProductTypeX()) but here it is:
Lets assume that you end up with JAXBElement<ProductType> productType when you unmarshall.
Now you need to end up with a Element (org.w3c.dom.Element) object. Which can be done like this:
DOMResult res = new DOMResult();
marshaller.marshal(productType, res);
Element elt = ((Document)res.getNode()).getDocumentElement();
Now the interface Element extends the interface Node from which we can
come to a conclusion that we end up here with a TREE structure object and we can get his existing children like :
NodeList nodeList = elt.getChildNodes();
Now you can check the type and value of every Node but you have to check if the Node is an ELEMENT_NODE or ATTRIBUTE_NODE in most cases:
for (int i = 0; i < nodeList.getLength(); i++) {
Node currentNode = nodeList.item(i);
if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
currentNode.getNodeName();
currentNode.getTextContent();
//And whatever you like
}
}
I hope this will help you or give you any directions how to get what you need.

How can I traverse xml nodes without knowing its schema

I know I can use DocumentBuilder to parse an xml file and traverse through the nodes but I am stuck at figuring out if the node has any more children. So for example in this xml:
<MyDoc>
<book>
<title> ABCD </title>
</book>
</MyDoc>
if I do node.hasChildNodes() I get true for both book and title. But what I am trying to do is if a node has some text value (not attributes) like title then print it otherwise don't do anything. I know this is some simple check but I just can't seem to find the answer on web. I am probably not searching with right keywords. Thanks in advance.
Try getChildNodes(). That will return a NodeList object which will allow you to iterate through all of the Nodes under the one you're referencing. regardless of what names they might have.
You have to check the type of the child nodes that you get by calling getChildNodes()by calling getNodeType(). <book> has a child of type ELEMENT_NODE whereas <title> has a child of type TEXT_NODE.
I am not sure but I think you wanted a way to iterate through all of the elements regardless of how nested it is. The below recursively goes through all elements. It then prints the elements value as long as its not just white space:
public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("test.xml");
NodeList childNodes = doc.getChildNodes();
iterateNodes(childNodes);
}
private static void iterateNodes(NodeList childNodes)
{
for (int i = 0; i < childNodes.getLength(); ++i)
{
Node node = childNodes.item(i);
String text = node.getNodeValue();
if (text != null && !text.trim().isEmpty()) {
System.out.println(text);
}
if (node.hasChildNodes()) {
iterateNodes(node.getChildNodes());
}
}
}
Text nodes exist under element nodes in a DOM, and data is always stored in text nodes. Perhaps the most common error in DOM processing is to navigate to an element node and expect it to contain the data that is stored in that element. Not so! Even the simplest element node has a text node under it that contains the data.
Ref: http://docs.oracle.com/javase/tutorial/jaxp/dom/readingXML.html

Java XPath : iterating over a collection of nodes and their indices

I have this XML instance document:
<entities>
<person>James</person>
<person>Jack</person>
<person>Jim</person>
</entities>
And with the following code I iterate over the person nodes and print their names:
XPathExpression expr = xpath.compile("/entities/person");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
String nodeName = node.getNodeName();
String name = xpath.compile("text()").evaluate(node).trim();
System.out.printf("node type = %s, node name = %s\n", nodeName, name);
}
Now what I would like is to also have access to the index of each node.
I know I can trivially get it from the i loop variable but I want to get it as an XPath expression instead, preferably in no different way than I get the value of the text() XPath expression.
My use-case is that I am trying to handle all attributes I collect as XPath expressions (which I load at run-time from a config file) so that I minimize non-generic code, so I don't want to treat the index as a special case.
You'd have to use a trick like counting the preceding siblings
count(preceding-sibling::person)
which gives 0 for the first person, 1 for the second one, etc.
Try using position()
String index = xpath.compile("position()").evaluate(node).trim();

How to get first-level children of an element in jsoup

In jsoup Element.children() returns all children (descendants) of Element. But, I want the Element's first-level children (direct children).
Which method can I use?
Element.children() returns direct children only. Since you get them bound to a tree, they have children too.
If you need the direct children elements without the underlying tree structure then you need to create them as follows
public static void main(String... args) {
Document document = Jsoup
.parse("<div><ul><li>11</li><li>22</li></ul><p>ppp<span>sp</span</p></div>");
Element div = document.select("div").first();
Elements divChildren = div.children();
Elements detachedDivChildren = new Elements();
for (Element elem : divChildren) {
Element detachedChild = new Element(Tag.valueOf(elem.tagName()),
elem.baseUri(), elem.attributes().clone());
detachedDivChildren.add(detachedChild);
}
System.out.println(divChildren.size());
for (Element elem : divChildren) {
System.out.println(elem.tagName());
}
System.out.println("\ndivChildren content: \n" + divChildren);
System.out.println("\ndetachedDivChildren content: \n"
+ detachedDivChildren);
}
Output
2
ul
p
divChildren content:
<ul>
<li>11</li>
<li>22</li>
</ul>
<p>ppp<span>sp</span></p>
detachedDivChildren content:
<ul></ul>
<p></p>
This should give you the desired list of direct descendants of the parent node:
Elements firstLevelChildElements = doc.select("parent-tag > *");
OR You can also try to retrieve the parent element, get the first child node via child(int index) and then try to retrieve siblings of this child via siblingElements().
This will give you the list of first level children excluding the used child, however you'd have to add the child externally.
Elements firstLevelChildElements = doc.child(0).siblingElements();
You could always use the ELEMENT.child(index) with the index you can choose which child you want.
Here you can get the value of first-level children
Element addDetails = doc.select("div.container > div.main-content > div.clearfix > div.col_7.post-info > ul.no-bullet").first();
Elements divChildren = addDetails.children();
for (Element elem : divChildren) {
System.out.println(elem.text());
}

XPath query returns duplicate nodes

I have a SOAP response that I'm processing in Java. It has a element with several different child elements. I'm using the following code to try to grab all of the bond nodes and find which one has a child tag with a value of ACTIVE. The NodeList returned by the initial evaluate statement contains 4 nodes, which is the correct number of children in the SOAP response, but they are all duplicates of the first element. Here is the code:
NodeList nodes = (NodeList)xpath.evaluate("//:bond", doc, XPathConstants.NODESET);
for(int i = 0; i < nodes.getLength(); i++){
HashMap<String, String> map = new HashMap<String, String>();
Element bond = (Element)nodes.item(i);
// Get only active bonds
String status = xpath.evaluate("//:status", bond);
String id = xpath.evaluate("//:instrumentId", bond);
if(!status.equals("ACTIVE"))
continue;
map.put("isin", xpath.evaluate(":isin", bond));
map.put("cusip", xpath.evaluate(":cusip", bond));
}
Thanks for your help,
Jared
The answer to your immediate question is that expressions like //:status will ignore the node that you pass in, and start from the root of the document.
However, there's probably an easier solution than what you've got, by using XPath to apply the test to the node. I think this should work, although it might contain typos (in particular, I can't remember whether text() can stand on its own or must be used in a predicate expression):
//:bond/:status[text()='ACTIVE']/..

Categories