Correct XPathExpression for retrieving Nodes in Java?

Correct XPathExpression for retrieving Nodes in Java? - java

I have an XML document that has multiple hpp:HourlyHistoricalPrice elements as in the following way:
<?xml version="1.0">
<hhp:HourlyHistoricalPrices xmlns:hhp="urn:or-HourlyHistoricalPrices">
<hhp:HourlyHistoricalPrice xmlns:hhp="urn:or-HourlyHistoricalPrice">
<hhp:indexId>1025127</hhp:indexId>
<hhp:resetDate>20161231T000000</hhp:resetDate>
<hhp:refSource>AIBO</hhp:refSource>
<hhp:indexLocation/>
<hhp:price1>50,870000</hhp:price1>
...
<hhp:price48>43,910000</hhp:price48>
</hhp:HourlyHistoricalPrice>
<hhp:HourlyHistoricalPrice xmlns:hhp="urn:or-HourlyHistoricalPrice">
<hhp:indexId>1025127</hhp:indexId>
<hhp:resetDate>20160101T000000</hhp:resetDate>
<hhp:refSource>AIBO</hhp:refSource>
<hhp:indexLocation/>
<hhp:price1>51,870000</hhp:price1>
...
<hhp:price48>49,910000</hhp:price48>
</hhp:HourlyHistoricalPrice>
<hhp:HourlyHistoricalPrice xmlns:hhp="urn:or-HourlyHistoricalPrice">
<hhp:indexId>1025127</hhp:indexId>
<hhp:resetDate>20163112T000000</hhp:resetDate>
<hhp:refSource>APX</hhp:refSource>
<hhp:indexLocation/>
<hhp:price1>63,870000</hhp:price1>
...
<hhp:price48>29,910000</hhp:price48>
</hhp:HourlyHistoricalPrice>
</hhp:HourlyHistoricalPrices>
I want to retrieve only the hhp:HourlyHistoricalPrice nodes that have a particular value for hhp:refSource, e.g. AIBO.
I was trying the below XPathExpression but this retrieves nothing.
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
String strExprssion =
"/hhp:HourlyHistoricalPrices/hhp:HourlyHistoricalPrice[hhp:refSource='AIBO']";
XPathExpression expression = xpath.compile(strExprssion);
NodeList nodes = (NodeList) expression.evaluate(originalXmlDoc, XPathConstants.NODESET);
System.out.println(nodes.getLength());
I would be grateful if somebody could provide advise on the correct expression to use.
Thanks a lot.

You need to expand the prefix into the xml namespace it represents:
String strExprssion = "//urn:or-HourlyHistoricalPrice:HourlyHistoricalPrice[urn:or-HourlyHistoricalPrice:refSource='AIBO']";
So for me, this test class
public class XPathCheck {
public static void main(String[] args) throws FileNotFoundException, IOException, XPathExpressionException {
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
try (InputStream file = new FileInputStream(Paths.get("src", "inputFile.xml").toFile())) {
String strExprssion = "//urn:or-HourlyHistoricalPrice:HourlyHistoricalPrice[urn:or-HourlyHistoricalPrice:refSource='AIBO']";
XPathExpression expression = xpath.compile(strExprssion);
NodeList nodes = (NodeList) expression.evaluate(new InputSource(file), XPathConstants.NODESET);
System.out.println(nodes.getLength());
}
}
}
outputs "2".

Related

XPath compiling behaviour

I am testing my application and realised that behaviour is different when compiling.
For example, if my expression to compile is :
XPathExpression expr = xPath.compile("/DocDetails/TransactionSignature");
And :
XPathExpression expr2 = xPath.compile("/DocDetails/" + x);
x is declared as a String datatype.
Lets say that x in expr2 is "abc", XPathExpression is compiled with no issues.
But if x in expr2 is "123abc" OR "123", XPathExpression throws a :
javax.xml.transform.TransformerException: A location step was expected
following the '/' or '//' token.
Just curious regarding this behaviour..
Here is the full code for reference:
String document = "C:/Users/Eunice/Documents/MITS/doc.xml";
String document2 = "C:/Users/Eunice/Documents/MITS/doc2.xml";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(document);
Document doc2 = builder.parse(document2);
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression expr = xPath.compile("/DocDetails/TransactionSignature");
Node node = (Node)expr.evaluate(doc, XPathConstants.NODE);
String x = node.getTextContent();
System.out.println(x);
XPathExpression expr2 = xPath.compile("/DocDetails/" + x);
Node node2 = (Node)expr2.evaluate(doc2, XPathConstants.NODE);
if (node2 == null)
System.out.println("null");
else
System.out.println("not null " + node2.getTextContent());
And this is the XML file:
<DocDetails>
<TransactionSignature>abc123</TransactionSignature>
</DocDetails>

But if x in expr2 is "123abc" OR "123", XPathExpression throws a
XML element name cannot start with number. Hence your example is equivalent to
XPathExpression expr2 = xPath.compile("/DocDetails/123abc");
I guess XPath parser does not expect it.
You should also provide full XML. I believe it certainly does not contain anything like <DocDetails><TransactionSignature>abc123</TransactionSignature><123abc>something</123abc></DocDetails>. This is simply invalid XML.

I finally found the answer after much searching!
It is actually illegal to start an element tag with numbers.
As can be seen in this stackoverflow answer
Originally, this line was throwing an transformer exception:
XPathExpression expr2 = xPath.compile("/DocDetails/" + x);
Since it is illegal to start with numbers, they are probably reading it as an invalid tag.
Which means this line is actually reading "/DocDetails/" instead of "/DocDetails/123" OR "/DocDetails/123abc",
causing the extra '/' at the end, hence throwing an transformer exception.

Java XPath - select all nodes does not work with namespaces

I would like to select all elements (in my case //ab) from given XML. But it does not select anything (matches variable contains empty list):
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//ab");
NodeList matches = (NodeList) expr.evaluate(new InputSource(new ByteArrayInputStream(
("<x:xml xmlns:x=\"yyy\">"+
"<x:ab>123</x:ab>"+
"</x:xml>").getBytes())), XPathConstants.NODESET);
if (matches != null)
{
for (int i = 0; i < matches.getLength(); i++)
{
Node node = (Node) matches.item(i);
System.out.println(node.getNodeName());
System.out.println(node.getNodeValue());
}
}
Interesting is when change the xpath to : *//ab , then it does select the ab element.
Also when I remove namespace from xml then the xpath //ab does select the ab element.
Same as when I add namespaceContext to xpath object and add namespace to xpath query: //x:ab , then it does select the ab element.
How should I change the code, so I would get ab element without changing the query?

parse xml using dom java

I have the bellow xml:
<modelingOutput>
<listOfTopics>
<topic id="1">
<token id="354">wish</token>
</topic>
</listOfTopics>
<rankedDocs>
<topic id="1">
<documents>
<document id="1" numWords="0"/>
<document id="2" numWords="1"/>
<document id="3" numWords="2"/>
</documents>
</topic>
</rankedDocs>
<listOfDocs>
<documents>
<document id="1">
<topic id="1" percentage="4.790644689978203%"/>
<topic id="2" percentage="11.427632949428334%"/>
<topic id="3" percentage="17.86913349249596%"/>
</document>
</documents>
</listOfDocs>
</modelingOutput>
Ι Want to parse this xml file and get the topic id and percentage from ListofDocs
The first way is to get all document element from xml and then I check if grandfather node is ListofDocs.
But the element document exist in rankedDocs and in listOfDocs, so I have a very large list.
So I wonder if exist better solution to parse this xml avoiding if statement?
My code:
public void parse(){
Document dom = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
dom = db.parse(is);
Element doc = dom.getDocumentElement();
NodeList documentnl = doc.getElementsByTagName("document");
for (int i = 1; i <= documentnl.getLength(); i++) {
Node item = documentnl.item(i);
Node parentNode = item.getParentNode();
Node grandpNode = parentNode.getParentNode();
if(grandpNode.getNodeName() == "listOfDocs"{
//get value
}
}
}

First, when checking the node name you shouldn't compare Strings using ==. Always use the equals method instead.
You can use XPath to evaluate only the document topic elements under listOfDocs:
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xPathExpression = xPath.compile("//listOfDocs//document/topic");
NodeList topicnl = (NodeList) xPathExpression.evaluate(dom, XPathConstants.NODESET);
for(int i = 0; i < topicnl.getLength(); i++) {
...

If you do not want to use the if statement you can use XPath to get the element you need directly.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("source.xml");
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/*/listOfDocs/documents/document/topic");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getAttributes().getNamedItem("id"));
System.out.println(nodes.item(i).getAttributes().getNamedItem("percentage"));
}
Please check GitHub project here.
Hope this helps.

I like to use XMLBeam for such tasks:
public class Answer {
#XBDocURL("resource://data.xml")
public interface DataProjection {
public interface Topic {
#XBRead("./#id")
int getID();
#XBRead("./#percentage")
String getPercentage();
}
#XBRead("/modelingOutput/listOfDocs//document/topic")
List<Topic> getTopics();
}
public static void main(final String[] args) throws IOException {
final DataProjection dataProjection = new XBProjector().io().fromURLAnnotation(DataProjection.class);
for (Topic topic : dataProjection.getTopics()) {
System.out.println(topic.getID() + ": " + topic.getPercentage());
}
}
}
There is even a convenient way to convert the percentage to float or double. Tell me if you like to have an example.

Retrieve value of XML node and node attribute using XPath in JAXP

Given an xml document that looks like the following:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="agentType">STANDARD</entry>
<entry key="DestinationTransferStates"></entry>
<entry key="AgentStatusPublishRate">300</entry>
<entry key="agentVersion">f000-703-GM2-20101109-1550</entry>
<entry key="CommandTimeUTC">2010-12-24T02:25:43Z</entry>
<entry key="PublishTimeUTC">2010-12-24T02:26:09Z</entry>
<entry key="queueManager">AGENTQMGR</entry>
</properties>
I want to print the values of the "key" attribute and the element so it looks like this:
agentType = STANDARD
DestinationTransferStates =
AgentStatusPublishRate = 300
agentVersion = f000-703-GM2-20101109-1550
CommandTimeUTC = 2010-12-24T02:25:43Z
PublishTimeUTC = 2010-12-24T02:26:09Z
queueManager = AGENTQMGR
I'm able to print the node values with no problem using this code:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//properties/entry/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
And I can print the values of the "key" attribute by changing the xpath expression and the node methods as follows:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//properties/entry");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getAttributes().getNamedItem("key").getNodeValue());
}
It seems like there would be a way to get at both of these values in a single evaluate. I could always evaluate two NodeLists and iterate through them with a common index but I'm not sure they are guaranteed to be returned in the same order. Any suggestions appreciated.

What about getTextContent()? This should do the work.
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++)
{
Node currentItem = nodes.item(i);
String key = currentItem.getAttributes().getNamedItem("key").getNodeValue();
String value = currentItem.getTextContent();
System.out.printf("%1s = %2s\n", key, value);
}
For further informations please see the javadoc for getTextContent(). I hope this will help you.

Using SAXON Xpath engine in Java

Here is my code :
public static void main(String[] args) {
// System.setProperty(
// "javax.xml.xpath.XPathFactory",
// "net.sf.saxon.xpath.XPathFactoryImpl");
String xml="<root><a>#BBB#</a><a>#CCC#</a><b><a>#DDD#</a></b></root>";
try{
JDocument dom = new JDocument(xml);
XPathFactory factory = net.sf.saxon.xpath.XPathFactoryImpl.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//a[matches(.,'#...#')]");
Object result = expr.evaluate(dom, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
Nodes sharped = new Nodes(nodes);
for (Node n:sharped){
System.out.println(n.toString());
}
}
catch(Exception e){
e.printStackTrace();
}
}
And I get this :
javax.xml.transform.TransformerException: Impossible to find the function : matches
at org.apache.xpath.compiler.XPathParser.error(XPathParser.java:608)
at org.apache.xpath.compiler.XPathParser.FunctionCall(XPathParser.java:1505)
at org.apache.xpath.compiler.XPathParser.PrimaryExpr(XPathParser.java:1444)
at org.apache.xpath.compiler.XPathParser.FilterExpr(XPathParser.java:1343)
at org.apache.xpath.compiler.XPathParser.PathExpr(XPathParser.java:1276)
Which means Java is using org.apache.xpath.compiler.XPathParser class when I clearly created my factory through net.sf.saxon.xpath.XPathFactoryImpl.
(I actually only need to put some matches in my xpaths... so if any solution not involving Saxon is known, consider my need reached).
What am I doing wrong ?

From Saxon examples :
System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory xpf = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
Works fine.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Correct XPathExpression for retrieving Nodes in Java? - java

Related

XPath compiling behaviour

Java XPath - select all nodes does not work with namespaces

parse xml using dom java

Retrieve value of XML node and node attribute using XPath in JAXP

Using SAXON Xpath engine in Java

Categories

Resources