Java XPath - select all nodes does not work with namespaces - java

I would like to select all elements (in my case //ab) from given XML. But it does not select anything (matches variable contains empty list):
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//ab");
NodeList matches = (NodeList) expr.evaluate(new InputSource(new ByteArrayInputStream(
("<x:xml xmlns:x=\"yyy\">"+
"<x:ab>123</x:ab>"+
"</x:xml>").getBytes())), XPathConstants.NODESET);
if (matches != null)
{
for (int i = 0; i < matches.getLength(); i++)
{
Node node = (Node) matches.item(i);
System.out.println(node.getNodeName());
System.out.println(node.getNodeValue());
}
}
Interesting is when change the xpath to : *//ab , then it does select the ab element.
Also when I remove namespace from xml then the xpath //ab does select the ab element.
Same as when I add namespaceContext to xpath object and add namespace to xpath query: //x:ab , then it does select the ab element.
How should I change the code, so I would get ab element without changing the query?

Related

XPath compiling behaviour

I am testing my application and realised that behaviour is different when compiling.
For example, if my expression to compile is :
XPathExpression expr = xPath.compile("/DocDetails/TransactionSignature");
And :
XPathExpression expr2 = xPath.compile("/DocDetails/" + x);
x is declared as a String datatype.
Lets say that x in expr2 is "abc", XPathExpression is compiled with no issues.
But if x in expr2 is "123abc" OR "123", XPathExpression throws a :
javax.xml.transform.TransformerException: A location step was expected
following the '/' or '//' token.
Just curious regarding this behaviour..
Here is the full code for reference:
String document = "C:/Users/Eunice/Documents/MITS/doc.xml";
String document2 = "C:/Users/Eunice/Documents/MITS/doc2.xml";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(document);
Document doc2 = builder.parse(document2);
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression expr = xPath.compile("/DocDetails/TransactionSignature");
Node node = (Node)expr.evaluate(doc, XPathConstants.NODE);
String x = node.getTextContent();
System.out.println(x);
XPathExpression expr2 = xPath.compile("/DocDetails/" + x);
Node node2 = (Node)expr2.evaluate(doc2, XPathConstants.NODE);
if (node2 == null)
System.out.println("null");
else
System.out.println("not null " + node2.getTextContent());
And this is the XML file:
<DocDetails>
<TransactionSignature>abc123</TransactionSignature>
</DocDetails>
But if x in expr2 is "123abc" OR "123", XPathExpression throws a
XML element name cannot start with number. Hence your example is equivalent to
XPathExpression expr2 = xPath.compile("/DocDetails/123abc");
I guess XPath parser does not expect it.
You should also provide full XML. I believe it certainly does not contain anything like <DocDetails><TransactionSignature>abc123</TransactionSignature><123abc>something</123abc></DocDetails>. This is simply invalid XML.
I finally found the answer after much searching!
It is actually illegal to start an element tag with numbers.
As can be seen in this stackoverflow answer
Originally, this line was throwing an transformer exception:
XPathExpression expr2 = xPath.compile("/DocDetails/" + x);
Since it is illegal to start with numbers, they are probably reading it as an invalid tag.
Which means this line is actually reading "/DocDetails/" instead of "/DocDetails/123" OR "/DocDetails/123abc",
causing the extra '/' at the end, hence throwing an transformer exception.

Xpath display correct results in XMLSpy but null in Java

I am trying to display all text within text nodes only, within an XFA XML document while ignoring namespaces.
I came up with an Xpath that returns the desired results within XMLSpy with xpath 1.0 but the same Xpath in Java returns null for some reason.
Xpath = //*[local-name()='text'][string-length(normalize-space(.))>0]
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
ArrayList<String> list = new ArrayList<>();
XPathExpression expr = xpath.compile("//*[local-name()='text'][string-length(normalize-space(.))>0]");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println("This prints null = " + nodes.item(i).getNodeValue());
}
XML file wouldn't post here so it can be viewed at the link below:
https://drive.google.com/file/d/1n-v3gzT-3GgxNnYKFUvMPjRQmtnkqcpY/view?usp=sharing
The problem is that it's not the <text> elements that contain the values, but their child text nodes.
Replace the line
System.out.println("This prints null = " + nodes.item(i).getNodeValue());
with
System.out.println("This does not print null = " + nodes.item(i).getFirstChild().getNodeValue());

XPATH evaluation against child node

I have xml as follows,
<students>
<Student><age>23</age><id>2000</id><name>PP2000</name></Student>
<Student><age>23</age><id>1000</id><name>PP1000</name></Student>
</students>
I have 2 xpaths Template XPATH = students/Student will be the template nodes, but I cannot hard code this xpath, because it will change for other XMLs, and XML is pretty dynamic, can expand (but with the same base XPATHs) So if I evaluate one more XPATH using the template node, I'm using the following code,
XPath xpathResource = XPathFactory.newInstance().newXPath();
Document xmlDocument = //creating document;
NodeList nodeList = (NodeList)xpathResource.compile("//students/Student").evaluate(xmlDocument, XPathConstants.NODESET);
for (int nodeIndex = 0; nodeIndex < nodeList.getLength(); nodeIndex++) {
Node currentNode = nodeList.item(nodeIndex);
String xpathID = "//students/Student/id";
String xpathName = "//students/Student/name";
NodeList childID = (NodeList)xpathResource.compile(xpathID).evaluate(currentNode, XPathConstants.NODESET);
NodeList childName = (NodeList)xpathResource.compile(xpathName).evaluate(currentNode, XPathConstants.NODESET);
System.out.println("node ID " +childID.item(0).getTextContent());
System.out.println("node Name " +childName.item(0).getTextContent());
}
Now the problem is, this for loop will execute for 2 times, but both time I'm getting 2000 , PP2000 as ID value. Is there any way to iterate to the child node with generic XPATH against a node. I cannot go generic XPATH against the whole XMLDocument, I have some validation to do. I want to use XML nodelist as result set rows, so that I can validate the XML value and do my stuff.
XPath xpathResource = XPathFactory.newInstance().newXPath();
Document xmlDocument = //creating document;
NodeList nodeList = (NodeList)xpathResource.compile("//students/Student/id").evaluate(xmlDocument, XPathConstants.NODESET);
for (int nodeIndex = 0; nodeIndex < nodeList.getLength(); nodeIndex++) {
Node currentNode = nodeList.item(nodeIndex);
System.out.println("node " +currentNode.getTextContent());
}

How to narrow an XPath evaluation to a single node instead of a whole document in Java?

XML stream
<l>
<i>
<a>AAA</a>
<b>BBB</b>
<c>CCC</c>
</i>
<i>
<a>AAA2</a>
<b>BBB2</b>
<c>CCC2</c>
</i>
<i>
...
</i>
</l>
I want to output the following text with some Java code:
> CCC
> CCC2
...
Here is the code I wrote to produce the expected result:
Java code
DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document d = docBuilder.parse("file:///C:/path/to/my/xml/stream.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//i");
NodeList listOfiNodes = (NodeList) expr.evaluate(d, XPathConstants.NODESET);
for(int i=0;i<listOfiNodes.getLength();i++) {
XPathExpression expr2 = xpath.compile("//c");
System.out.println("> " + ((Node) expr2.evaluate(listOfiNodes.item(i), XPathConstants.NODE)).getTextContent());
}
expr2 keeps on returning the first c node. So I get this output:
> CCC
> CCC
...
The evaluation performed by expr2 doesn't seem to "stay" on the node passed to evaluate() method. Why?
NOTA: I don't want to get the c nodes directly with the xpath //i/c (or /l/i/c).
Java 6
//c selects all matching nodes in the whole document. Use c instead and you will receive this output:
> CCC
> CCC2
Note that you will get an NPE if a Node i does not contain a c in the line where you print the results. The following code should be working as expected:
DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document d = docBuilder.parse("stream.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//i");
NodeList listOfiNodes = (NodeList) expr.evaluate(d, XPathConstants.NODESET);
for (int i = 0; i < listOfiNodes.getLength(); i++) {
javax.xml.xpath.XPathExpression expr2 = xpath.compile("c");
Node item = listOfiNodes.item(i);
Node node = (Node) expr2.evaluate(item, XPathConstants.NODE);
if (null != node) {
System.out.println("> " + node.getTextContent());
}
}
Change "//c" with ".//c"
XPathExpression expr2 = xpath.compile(".//c");
It will start the search anywhere from the current node instead of the whole document.
XPathExpression expr2 = (XPathExpression) xpath.compile(".//c");
for(int i=0;i<listOfiNodes.getLength();i++) {
System.out.println("> " + ((Node) expr2.evaluate(listOfiNodes.item(i), XPathConstants.NODE)).getTextContent());
}
Output:
CCC
CCC2

Looping over nodes and extracting specific subnode values using Java's XPath

I understand from Googling that it makes more sense to extract data from XML using XPath than by using DOM looping.
At the moment, I have implemented a solution using DOM, but the code is verbose, and it feels untidy and unmaintainable, so I would like to switch to a cleaner XPath solution.
Let's say I have this structure:
<products>
<product>
<title>Some title 1</title>
<image>Some image 1</image>
</product>
<product>
<title>Some title 2</title>
<image>Some image 2</image>
</product>
...
</products>
I want to be able to run a for loop for each of the <product> elements, and inside this for loop, extract the title and image node values.
My code looks like this:
InputStream is = conn.getInputStream();
DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(is);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/products/product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList products = (NodeList) result;
for (int i = 0; i < products.getLength(); i++) {
Node n = products.item(i);
if (n != null && n.getNodeType() == Node.ELEMENT_NODE) {
Element product = (Element) n;
// do some DOM navigation to get the title and image
}
}
Inside my for loop I get each <product> as a Node, which is cast to an Element.
Can I simply use my instance of XPathExpression to compile and run another XPath on the Node or the Element?
Yes, you can always do like this -
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/products/product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
expr = xpath.compile("title"); // The new xpath expression to find 'title' within 'product'.
NodeList products = (NodeList) result;
for (int i = 0; i < products.getLength(); i++) {
Node n = products.item(i);
if (n != null && n.getNodeType() == Node.ELEMENT_NODE) {
Element product = (Element) n;
NodeList nodes = (NodeList) expr.evaluate(product,XPathConstants.NODESET); //Find the 'title' in the 'product'
System.out.println("TITLE: " + nodes.item(0).getTextContent()); // And here is the title
}
}
Here I have given example of extracting the 'title' value. In same way you can do for 'image'
I'm not a big fan of this approach because you have to build a document (which might be expensive) before you can apply XPaths to it.
I've found VTD-XML a lot more efficient when it comes to applying XPaths to documents, because you don't need to load the whole document into memory. Here is some sample code:
final VTDGen vg = new VTDGen();
vg.parseFile("file.xml", false);
final VTDNav vn = vg.getNav();
final AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/products/product");
while (ap.evalXPath() != -1) {
System.out.println("PRODUCT:");
// you could either apply another xpath or simply get the first child
if (vn.toElement(VTDNav.FIRST_CHILD, "title")) {
int val = vn.getText();
if (val != -1) {
System.out.println("Title: " + vn.toNormalizedString(val));
}
vn.toElement(VTDNav.PARENT);
}
if (vn.toElement(VTDNav.FIRST_CHILD, "image")) {
int val = vn.getText();
if (val != -1) {
System.out.println("Image: " + vn.toNormalizedString(val));
}
vn.toElement(VTDNav.PARENT);
}
}
Also see this post on Faster XPaths with VTD-XML.

Categories