XPath query returns duplicate nodes - java

I have a SOAP response that I'm processing in Java. It has a element with several different child elements. I'm using the following code to try to grab all of the bond nodes and find which one has a child tag with a value of ACTIVE. The NodeList returned by the initial evaluate statement contains 4 nodes, which is the correct number of children in the SOAP response, but they are all duplicates of the first element. Here is the code:
NodeList nodes = (NodeList)xpath.evaluate("//:bond", doc, XPathConstants.NODESET);
for(int i = 0; i < nodes.getLength(); i++){
HashMap<String, String> map = new HashMap<String, String>();
Element bond = (Element)nodes.item(i);
// Get only active bonds
String status = xpath.evaluate("//:status", bond);
String id = xpath.evaluate("//:instrumentId", bond);
if(!status.equals("ACTIVE"))
continue;
map.put("isin", xpath.evaluate(":isin", bond));
map.put("cusip", xpath.evaluate(":cusip", bond));
}
Thanks for your help,
Jared

The answer to your immediate question is that expressions like //:status will ignore the node that you pass in, and start from the root of the document.
However, there's probably an easier solution than what you've got, by using XPath to apply the test to the node. I think this should work, although it might contain typos (in particular, I can't remember whether text() can stand on its own or must be used in a predicate expression):
//:bond/:status[text()='ACTIVE']/..

Related

Simple dom4j parsing in Java - can't access child nodes

I know this is so easy and I've spent all day banging my head. I have an XML document like this:
<WMS_Capabilities version="1.3.0" xmlns="http://www.opengis.net/wms">
<Service>
<Name>WMS</Name>
<Title>Metacarta WMS VMaplv0</Title>
</Service>
<Capability>
<Layer>
<Name>Vmap0</Name>
<Title>Metacarta WMS VMaplv0</Title>
<Abstract>Vmap0</Abstract>
...
There can be multiple Layer nodes, and any Layer node can have a nested Layer node. I can quickly select all of the layer nodes and iterate through them with the following xpath code:
Map<String, String> uris = new HashMap<String, String>();
uris.put("wms", "http://www.opengis.net/wms");
XPath xpath1 = doc.createXPath("//wms:Layer");
xpath1.setNamespaceURIs(uris);
List nodes1 = xpath1.selectNodes(doc);
for (Iterator<?> layerIt = nodes1.iterator(); layerIt.hasNext();) {
Node node = (Node) layerIt.next();
}
I get back all Layer nodes. Perfect. But when I try to access each Name or Title child node, I get nothing. I've tried as many various combinations I can think of:
name = node.selectSingleNode("./wms:Name");
name = node.selectSingleNode("wms:Name");
name = node.selectSingleNode("Name");
etc etc, but it always returns null. I'm guessing it has something to do with the namespace, but all I'm after is the name and title text values for each one of the Layer nodes I've obtained. Can anybody offer any help:
I believe that Node.selectSingleNode() evaluates the supplied XPath expression with an empty namespace context. So there is no way of accessing a node in no namespace by name. It's necessary to use an expression such as *[local-name='Name']. If you want/need a namespace context, execute XPath expressions via the XPath object.
Thanks everybody for the help. It was Michael Kay's last clue that got it for me... I needed to use a relative path from the current node, include the namespace URI, and select from the context of the current node I'm iterating through:
Map<String, String> uris = new HashMap<String, String>();
uris.put("wms", "http://www.opengis.net/wms");
XPath xpath1 = doc.createXPath("//wms:Layer");
xpath1.setNamespaceURIs(uris);
List nodes1 = xpath1.selectNodes(doc);
for (Iterator<?> layerIt = nodes1.iterator(); layerIt.hasNext();) {
Node node = (Node) layerIt.next();
XPath nameXpath = node.createXPath("./wms:Name");
nameXpath.setNamespaceURIs(uris);
XPath titleXpath = node.createXPath("./wms:Title");
titleXpath.setNamespaceURIs(uris);
Node name = nameXpath.selectSingleNode(node);
Node title = titleXpath.selectSingleNode(node);
}

reading xml file with multiple child node

Consider i have a XML file like the below xml file.
<top>
<CRAWL>
<NAME>div[class=name],attr=0</NAME>
<PRICE>span[class~=(?i)(price-new|price-old)],attr=0</PRICE>
<DESC>div[class~=(?i)(sttl dyn|bin)],attr=0</DESC>
<PROD_IMG>div[class=image]>a>img,attr=src</PROD_IMG>
<URL>div[class=name]>a,attr=href</URL>
</CRAWL>
<CRAWL>
<NAME>img[class=img],attr=alt</NAME>
<PRICE>div[class=g-b],attr=0</PRICE>
<DESC>div[class~=(?i)(sttl dyn|bin)],attr=0</DESC>
<PROD_IMG>img[itemprop=image],attr=src</PROD_IMG>
<URL>a[class=img],attr=href</URL>
</CRAWL>
</top>
what i want is first take all the values coming under and after finishing the first operation go to the next one and repeat it even though i have more than two tag.I have managed to get if just one is available. using the values coming inside the tags i am doing some other function. in each it has values from different and i am using that values for different operations. everything else if fine other than i dont know how to loop the fetching inside the xml file.
regards
If I'm understanding this correctly, you're trying to extract data from ALL tags that exist within your XML fragment. There are multiple solutions to this. I'm listing them below:
XPath: If you know exactly what your XML structure is, you can employ XPath for each node=CRAWL to find data within tags:
// Instantiate XPath variable
XPath xpath = XPathFactory.newInstance().newXPath();
// Define the exact XPath expressions you want to get data for:
XPathExpression name = xpath.compile("//top/CRAWL/NAME/text()");
XPathExpression price = xpath.compile("//top/CRAWL/PRICE/text()");
XPathExpression desc = xpath.compile("//top/CRAWL/DESC/text()");
XPathExpression prod_img = xpath.compile("//top/CRAWL/PROD_IMG/text()");
XPathExpression url = xpath.compile("//top/CRAWL/URL/text()");
At this point, each of the variables above will contain the data for each of the tags. You could drop this into an array for each where you will have all the data for each of the tags in all elements.
The other (more efficient solution) is to have the data stored by doing DOM based parsing:
// Instantiate the doc builder
DocumentBuilder xmlDocBuilder = domFactory.newDocumentBuilder();
Document xmlDoc = xmlDocBuilder.parse("xmlFile.xml");
// Create NodeList of element tag "CRAWL"
NodeList crawlNodeList = xmlDoc.getElementsByTagName("CRAWL");
// Now iterate through each item in the NodeList and get the values of
// each of the elements in Name, Price, Desc etc.
for (Node node: crawlNodeList) {
NamedNodeMap subNodeMap = node.getChildNodes();
int currentNodeMapLength = subNodeMap.getLength();
// Get each node's name and value
for (i=0; i<currentNodeMapLength; i++){
// Iterate through all of the values in the nodeList,
// e.g. NAME, PRICE, DESC, etc.
// Do something with these values
}
}
Hope this helps!

JAXP XPath 1.0 or 2.0 - how to distinguish empty strings from non-existent values

Given the following XML instance:
<entities>
<person><name>Jack</name></person>
<person><name></name></person>
<person></person>
</entities>
I am using the following code to: (a) iterate over the persons and (b) obtain the name of each person:
XPathExpression expr = xpath.compile("/entities/person");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
String innerXPath = "name/text()";
String name = xpath.compile(innerXPath).evaluate(node);
System.out.printf("%2d -> name is %s.\n", i, name);
}
The code above is unable to distinguish between the 2nd person case (empty string for name) and the 3rd person case (no name element at all) and simply prints:
0 -> name is Jack.
1 -> name is .
2 -> name is .
Is there a way to distinguish between these two cases using a different innerXPath expression? In this SO question it seems that the XPath way would be to return an empty list, but I 've tried that too:
String innerXPath = "if (name) then name/text() else ()";
... and the output is still the same.
So, is there a way to distinguish between these two cases with a different innerXPath expression? I have Saxon HE on my classpath so I can use XPath 2.0 features as well.
Update
So the best I could do based on the accepted answer is the following:
XPathExpression expr = xpath.compile("/entities/person");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
String innerXPath = "name";
NodeList names = (NodeList) xpath.compile(innerXPath).evaluate(node, XPathConstants.NODESET);
String nameValue = null;
if (names.getLength()>1) throw new RuntimeException("impossible");
if (names.getLength()==1)
nameValue = names.item(0).getFirstChild()==null?"":names.item(0).getFirstChild().getNodeValue();
System.out.printf("%2d -> name is [%s]\n", i, nameValue);
}
The above code prints:
0 -> name is [Jack]
1 -> name is []
2 -> name is [null]
In my view this is not very satisfactory as logic is spread in both XPath and Java code and limits the usefulness of XPath as a host language and API-agnostic notation. My particular use case was to just keep a collection of XPaths in a property file and evaluate them at runtime in order to obtain the information I need without any ad-hoc extra handling. Apparently that's not possible.
The JAXP API, being based on XPath 1.0, is pretty limited here. My instinct would be to return the Name element (as a NodeList). So the XPath expression required is simply "Name". Then cases 1 and 2 will return a nodelist of length 1, while case 3 will return a nodelist of length 0. Cases 1 and 2 can then easily be distinguished within the application by getting the value of the node and testing whether it is zero-length.
Using /text() is always best avoided anyway, since it causes your query to be sensitive to the presence of comments in the XML.
As a long-time user of Saxon XSLT, I'm pleased to find once again that I like Michael Kay's recommendation here. Generally, I like the pattern of returning a collection for queries, even for queries that are expected to return only at most one instance.
What I don't like doing is having to open a bundled interface to try to solve a particular need and then finding that one has to reimplement much of what the original interface handled.
Therefore, here's a method that uses Michael's recommendation while avoiding the cost of having to reimplement a Node-to-String transformation that is recommended in other comments in this thread.
#Nonnull
public Optional<String> findString( #Nonnull final String expression )
{
try
{
// for XpathConstants.STRING XPath returns an empty string for both values of no length
// and for elements that are not present.
// therefore, ask for a NODESET and then retrieve the first Node if any
final FluentIterable<Node> matches =
IterableNodeList.from( (NodeList) xpath.evaluate( expression, node, XPathConstants.NODESET ) );
if ( matches.isEmpty() )
{
return Optional.absent();
}
final Node firstNode = matches.first().get();
// now let XPath process a known-to-exist Node to retrieve its String value
return Optional.fromNullable( (String) xpath.evaluate( ".", firstNode, XPathConstants.STRING ) );
}
catch ( XPathExpressionException xee )
{
return Optional.absent();
}
}
Here, XPath.evaluate is called a second time to do whatever it usually does to transform the first found Node to the requested String value. Without this, there is a risk that a re-implementation will yield a different result than a direct call for an XPathConstant.STRING over the same source node and for the same expression.
Of course, this code is using Guava Optional and FluentIterable to make the intention more explicit. If you don't want Guava, use Java 8 or refactor the implementation using nulls and NodeList's own collection methods.

Java XPath : iterating over a collection of nodes and their indices

I have this XML instance document:
<entities>
<person>James</person>
<person>Jack</person>
<person>Jim</person>
</entities>
And with the following code I iterate over the person nodes and print their names:
XPathExpression expr = xpath.compile("/entities/person");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
String nodeName = node.getNodeName();
String name = xpath.compile("text()").evaluate(node).trim();
System.out.printf("node type = %s, node name = %s\n", nodeName, name);
}
Now what I would like is to also have access to the index of each node.
I know I can trivially get it from the i loop variable but I want to get it as an XPath expression instead, preferably in no different way than I get the value of the text() XPath expression.
My use-case is that I am trying to handle all attributes I collect as XPath expressions (which I load at run-time from a config file) so that I minimize non-generic code, so I don't want to treat the index as a special case.
You'd have to use a trick like counting the preceding siblings
count(preceding-sibling::person)
which gives 0 for the first person, 1 for the second one, etc.
Try using position()
String index = xpath.compile("position()").evaluate(node).trim();

Java DOM: How to get how many child elements

I have an XML Document:
<entities xmlns="urn:yahoo:cap">
<entity score="0.988">
<text end="4" endchar="4" start="0" startchar="0">Messi</text>
<wiki_url>http://en.wikipedia.com/wiki/Lionel_Messi</wiki_url>
<types>
<type region="us">/person</type>
</types>
</entity>
</entities>
I have a TreeMap<String,String> data which stores the getTextContent() for both the "text" and "wiki_url" element. Some "entity"s will only have the "text" element (no "wiki_url") so i need a way of finding out when there is only the text element as the child and when there is a "wiki_url". I could use document.getElementByTag("text") & document.getElementByTag("wiki_url") but then I would lose the relationship between the text and the url.
I'm trying to get the amount of elements within the "entity" element by using:
NodeList entities = document.getElementsByTagName("entity"); //List of all the entity nodes
int nchild; //Number of children
System.out.println("Number of entities: "+ entities.getLength()); //Prints 1 as expected
nchild=entities.item(0).getChildNodes().getLength(); //Returns 7
However as shows above this returns 7 (which I don't understand, surely its 3 or 4 if you include the grandchild)
I was then going to use the number of children to cycle through them all to check if getNodeName().equals("wiki_url") and save it to data if correct.
Why is it that i am getting the number of children as 7 when I can only count 3 children and 1 grandchild?
The white-spaces following > of <entity score="0.988"> also count for nodes, similarly end of line chararcter between the tags are also parsed to nodes. If you are interested in a particular node with a name, add a helper method like below and call wherever you want.
Node getChild(final NodeList list, final String name)
{
for (int i = 0; i < list.getLength(); i++)
{
final Node node = list.item(i);
if (name.equals(node.getNodeName()))
{
return node;
}
}
return null;
}
and call
final NodeList childNodes = entities.item(0).getChildNodes();
final Node textNode = getChild(childNodes, "text");
final Node wikiUrlNode = getChild(childNodes, "wiki_url");
Normally when working with DOM, comeup with helper methods like above to simplify main processing logic.

Categories