getNodeName matches an XML node, but XPath can't find it - java

This feels like such a noob question.
I'm looking at a pile of Java code that manipulates an XML DOM. (The classes are the stock org.w3c.dom.Document and javax.xml.xpath.XPath and such that ship with JDK 7.) It has a ton of places that look like this:
String expr = "/fixed/path/through/the/hierarchy";
// actual code reuses factory instances, etc
XPath xpath = XPathFactory.newInstance().newXPath();
Node topNode = someDocumentInstance.getFirstChild();
Node node = (Node) xpath.evaluate (expr, topNode, XPathConstants.NODE);
NodeList children = node.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
Node child = children.item(i);
if (child.getNodeName().equalsIgnoreCase("somePrefix:someTag")) {
// "return child;" or otherwise break out of the loop
}
}
And it all works. But that loop seems a tedious effort; if we're already using XPath to get a node, why then iterate over that node's children looking for a known tag?
So I tried to rewrite a section to fetch the child node directly. But querying using
String expr = "/fixed/path/through/the/hierarchy/somePrefix:someTag";
never matches anything. I've tried variations like requesting XPathConstants.NODESET or .STRING, but still no results. (There should only ever be one of these nodes anyhow.)
I feel like I'm missing something supremely obvious here, but I can't figure out why the full query fails, when the query-for-parent plus a manual loop through the children works. Is XPath testing some quality of a node beyond getNodeName() when I use a query like that?
The only theory I've come up with is that it has something to do with XML namespaces, which aren't used in this project. (There's actually a call to .setNamespaceAware(false) on the DocumentBuilderFactory instance with a comment saying "leave this off or everything everywhere breaks".)

If you're parsing without namespaces, then you should leave somePrefix out of your expression:
String expr = "/fixed/path/through/the/hierarchy/someTag";
The reason for this is that XPath performs matches on namespace and local name, not qualified name (which is what getNodeName() returns). If you put a prefix in your XPath expression, the XPath interpreter will use that to retrieve the namespace from its namespace mapping. Since you haven't given it any mappings, that will fail.
Also, you probably want to use NODESET if you're going to iterate over the child nodes.

Related

Evaluate many elements with XPathExpression and NODESET

I parse a very large xml file (from jpylyzer, a jp2 properties extractor). This xml contains properties of many JP2 images, each one with the same elements, like :
//results/jpylyzer/fileInfo/fileName
//results/jpylyzer/properties/jp2HeaderBox/imageHeaderBox/height
//results/jpylyzer/properties/jp2HeaderBox/imageHeaderBox/width
//results/jpylyzer/properties/jp2HeaderBox/imageHeaderBox/bPCDepth
In order to reduce processing time, I'm using this method :
for (XPathExpression xPathExpression : listXPathExpression) {
nodeList = (NodeList) xPathExpression.evaluate(document, XPathConstants.NODESET);
//we use our list
}
It's very convenient and fast, but the number of elements must be as we expected for each property.
As some properties are unique to some images, some xpath values won't be found for some images.
nodeList is filled ONLY with found values, which is a problem : there's no way to match those values to other ones as lists don't have the same size depending on how many properties has been found.
Is there a way to fill "blank" when no value is found ?
What you want is not possible with a single XPath expression, not even with version 2.0. In such a case, you have to reach for the higher-level language you embed XPath in.
As I'm not familiar with Java very much, I cannot give you specific code, but I can explain what you have to do.
I assume an XML document similar to
<results>
<jpylyzer>
<fileInfo>
<fileName>Name of file</fileName>
</fileInfo>
<properties>
<jp2HeaderBox>
<imageHeaderBox>
<height>45</height>
<width>66</width>
<bPCDepth>386</bPCDepth>
</imageHeaderBox>
<imageHeaderBox>
<width>32</width>
</imageHeaderBox>
</jp2HeaderBox>
</properties>
</jpylyzer>
</results>
As a starting point, find an element that really is present in all XML documents, in all situations. For the sake of an example, let us assume imageHeaderBox is present everywhere, but its children height, width and bPCDepth are not necessarily there.
Find an XPath expression for the imageHeaderBox element:
/results/jpylyzer/properties/imageHeaderBox
evaluate the expression and save the result to a nodeList. Next, process this list further. This only works if XPath expressions can be applied to the individual items in a nodeList, but it seems you are optimistic about that:
I can iterate over nodelist. I guess i can evaluate too
Iterate over the nodeList (the result of the imageHeaderBox expression) and apply another path expression to each item.
XPath 2.0
In XPath 2.0, you can use an if/then statement that checks for the presence of a node. Assuming the imageHeaderBox element node as the context item:
if(height) then height else 'e.g. text saying there is no height'
XPath 1.0
With XPath 1.0, it's slightly more complicated:
concat(height, substring('e.g. text saying there is no height', 1 div not(height)))"
See Dimitre Novatchev's answer here for an explanation. The technique is known as the Becker method, probably introduced here.
Finally, the result list should look similar to
45
e.g. text saying there is no height

Meaning of #text in DOM parser

I'm relatively new to XML parsers, trying to understand some java code using DOM api to parse an XML document.
I need to know what '#text' means in the following code or even what this line of code does: -
if(!ChildNode.getNodeName().equals("#text"))
{
//do something
}
According to the JavaDoc, #text is the value of the nodeName attribute for nodes implementing the Text interface.
i.e. if a node in the document is a text node (as opposed to, for example, an element), it's nodeName will be #text.
The code in question appears to be checking whether the node referenced by ChildNode is a text node before performing some action. Presumably, the action is something that can't be performed upon a text node, like querying or adding to its children.

How do I check if an XML node is a leaf node in Java?

I want to list of all the leaf nodes present in an XML document. The XML is not fixed, thus the code should work for any given XML file.
Find an XML parser. Those libraries will parse the XML String for you and build an Object Oriented tree of the XML nodes (called a DOM, which stands for Document Object Model). There should be definitely a method like getChildCount(), getChildren() or isLeaf().
Take a look here: Best XML parser for Java
If you are using the DOM:
if (!myNode.hasChildNodes())
{
// found a leaf node
}

Searching for the first matching element after a specific node (XPath and ITunes XML)

it's not nessesary to post my full code because I have just a short questions. I'm searching with XPath in a XML Doc for a text Value. I have a XML Like
<key>Name</key>
<string>Dat Ass</string>
<key>Artist</key>
<string>Earl Sweatshirt</string>
<key>Album</key>
<string>Kitchen Cutlery</string>
<key>Kind</key>
<string>MPEG-Audiodatei</string>
I have an Expression like this:
//string[preceding-sibling::key[text()[contains(., 'Name')]]]/text()
but this gives me ALL following string-tags, I just want the first one with the Song-Title.
greets Alex
Use:
(//string[preceding-sibling::key[1] = 'Name'])[1]/text()
Alternatively, one can use a forward-only expression:
(//key[. = 'Name'])[1]/following-sibling::string[1]/text()
Do note:
This is a common error. Any expression of the kind:
//someExpr[1]
Doesn't select "the first node in the document from all nodes selected by //someExpr". In fact it can select many nodes.
The above expression selects any node that is selected by //someExpr and that is the first such child of its parent.
This is why, without brackets, the other answer to this question is generally incorrect.
You can just add another predicate [1] to select the first matching node. The nested predicate using text() should be unneccessary:
//string[preceding-sibling::key[contains(., 'Name')]][1]/text()
Another, perhaps more efficient, way to select this node would be
//key[contains(., 'Name')]/following-sibling::*[1][self::string]
This selects the first node (with any name) following the wanted key node and tests if its name is string.

Store XML data in DOM parser [duplicate]

I am new working in Java and XML DOM parser. I had a requirement like read the xml data and store it inform of column and rows type.
Example:sample.xml file
<staff>
<firstname>Swetha</firstname>
<lastname>EUnis</lastname>
<nickname>Swetha</nickname>
<salary>10000</salary>
</staff>
<staff>
<firstname>John</firstname>
<lastname>MAdiv</lastname>
<nickname>Jo</nickname>
<salary>200000</salary>
</staff>
i need to read this XML file and store it in the above format:
firstName,lastName,nickName,Salary
swetha,Eunis,swetha,10000
john,MAdiv,Jo,200000
Java Code:
NodeList nl= doc.getElementsByTagName("*");
for(int i=0;i< nl.getLength();i++)
{
Element section = (Element) nl.item(i);
Node title = section.getFirstChild();
while (title != null && title.getNodeType() != Node.ELEMENT_NODE)
{
title = title.getNextSibling();
if (title != null)
{
String first=title.getFirstChild().getNodeValue().trim();
if(first!=null)
{
title = title.getNextSibling();
}
System.out.print(first + ",");
} }
System.out.println("");
}//for
I did the above code, but i am not able to find the way to get the data in the above column and row format. Can any one please please kindly help me in solving my issue, i am looking into it from past many days
Since this looks like homework, I'm going to give you some hints:
The chances are that your lecturer has given you some lecture notes and/or examples on processing an XML DOM. Read them all again.
The getElementsByTagName method takes an element name as a parameter. "*" is not a valid element name, so the call won't return anything.
Your code needs to mirror the structure of the XML. The XML structure in this case consists of N staff elements, each of which contains elements named firstname, lastname, nickname and salary.
It is also possible that your lecturer expects you to use something like XSLT or an XML binding mechanism to simplify this. (Or maybe this was intended to be XMI rather than XML ... in which there are other ways to handle this ...)
I kept getElementsByTagName method parameter "*" because to read the data dynamically.
Well, it doesn't work!! The DOM getElementsByTagName method does NOT accept a pattern of any kind.
If you want to make your code generic, you can't use getElementsByTagName. You will need to walk the tree from the top, starting with the DOM's root node.
Can you please provide me with sample data.
No. Your lecturer would not approve of me giving you code to copy from. However, I will point out that there are lots of XML DOM tutorials on the web which should help you figure out what you need to do. The best thing is for you to do the work yourself. You will learn more that way ... and that is the whole point of your homework!
1. The DOM Parser will parse the entire XML file to create the DOM object.
2. You will always need to be aware of the the type of output and the structure of xml returned when a request is fired on a web-service.
3. And its Not the XML structure of a reply which is returned from the Webservice that will be dynamic, but the child elements values and attributes can be Dynamic.
4. You will need to handle this dynamic behavior with try/catch block...
For further details on DOM PARSER, see this site...
http://tutorials.jenkov.com/java-xml/dom.html

Categories