Using SAXON Xpath engine in Java - java

Here is my code :
public static void main(String[] args) {
// System.setProperty(
// "javax.xml.xpath.XPathFactory",
// "net.sf.saxon.xpath.XPathFactoryImpl");
String xml="<root><a>#BBB#</a><a>#CCC#</a><b><a>#DDD#</a></b></root>";
try{
JDocument dom = new JDocument(xml);
XPathFactory factory = net.sf.saxon.xpath.XPathFactoryImpl.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//a[matches(.,'#...#')]");
Object result = expr.evaluate(dom, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
Nodes sharped = new Nodes(nodes);
for (Node n:sharped){
System.out.println(n.toString());
}
}
catch(Exception e){
e.printStackTrace();
}
}
And I get this :
javax.xml.transform.TransformerException: Impossible to find the function : matches
at org.apache.xpath.compiler.XPathParser.error(XPathParser.java:608)
at org.apache.xpath.compiler.XPathParser.FunctionCall(XPathParser.java:1505)
at org.apache.xpath.compiler.XPathParser.PrimaryExpr(XPathParser.java:1444)
at org.apache.xpath.compiler.XPathParser.FilterExpr(XPathParser.java:1343)
at org.apache.xpath.compiler.XPathParser.PathExpr(XPathParser.java:1276)
Which means Java is using org.apache.xpath.compiler.XPathParser class when I clearly created my factory through net.sf.saxon.xpath.XPathFactoryImpl.
(I actually only need to put some matches in my xpaths... so if any solution not involving Saxon is known, consider my need reached).
What am I doing wrong ?

From Saxon examples :
System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory xpf = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
Works fine.

Related

XPath compiling behaviour

I am testing my application and realised that behaviour is different when compiling.
For example, if my expression to compile is :
XPathExpression expr = xPath.compile("/DocDetails/TransactionSignature");
And :
XPathExpression expr2 = xPath.compile("/DocDetails/" + x);
x is declared as a String datatype.
Lets say that x in expr2 is "abc", XPathExpression is compiled with no issues.
But if x in expr2 is "123abc" OR "123", XPathExpression throws a :
javax.xml.transform.TransformerException: A location step was expected
following the '/' or '//' token.
Just curious regarding this behaviour..
Here is the full code for reference:
String document = "C:/Users/Eunice/Documents/MITS/doc.xml";
String document2 = "C:/Users/Eunice/Documents/MITS/doc2.xml";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(document);
Document doc2 = builder.parse(document2);
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression expr = xPath.compile("/DocDetails/TransactionSignature");
Node node = (Node)expr.evaluate(doc, XPathConstants.NODE);
String x = node.getTextContent();
System.out.println(x);
XPathExpression expr2 = xPath.compile("/DocDetails/" + x);
Node node2 = (Node)expr2.evaluate(doc2, XPathConstants.NODE);
if (node2 == null)
System.out.println("null");
else
System.out.println("not null " + node2.getTextContent());
And this is the XML file:
<DocDetails>
<TransactionSignature>abc123</TransactionSignature>
</DocDetails>
But if x in expr2 is "123abc" OR "123", XPathExpression throws a
XML element name cannot start with number. Hence your example is equivalent to
XPathExpression expr2 = xPath.compile("/DocDetails/123abc");
I guess XPath parser does not expect it.
You should also provide full XML. I believe it certainly does not contain anything like <DocDetails><TransactionSignature>abc123</TransactionSignature><123abc>something</123abc></DocDetails>. This is simply invalid XML.
I finally found the answer after much searching!
It is actually illegal to start an element tag with numbers.
As can be seen in this stackoverflow answer
Originally, this line was throwing an transformer exception:
XPathExpression expr2 = xPath.compile("/DocDetails/" + x);
Since it is illegal to start with numbers, they are probably reading it as an invalid tag.
Which means this line is actually reading "/DocDetails/" instead of "/DocDetails/123" OR "/DocDetails/123abc",
causing the extra '/' at the end, hence throwing an transformer exception.

Correct XPathExpression for retrieving Nodes in Java?

I have an XML document that has multiple hpp:HourlyHistoricalPrice elements as in the following way:
<?xml version="1.0">
<hhp:HourlyHistoricalPrices xmlns:hhp="urn:or-HourlyHistoricalPrices">
<hhp:HourlyHistoricalPrice xmlns:hhp="urn:or-HourlyHistoricalPrice">
<hhp:indexId>1025127</hhp:indexId>
<hhp:resetDate>20161231T000000</hhp:resetDate>
<hhp:refSource>AIBO</hhp:refSource>
<hhp:indexLocation/>
<hhp:price1>50,870000</hhp:price1>
...
<hhp:price48>43,910000</hhp:price48>
</hhp:HourlyHistoricalPrice>
<hhp:HourlyHistoricalPrice xmlns:hhp="urn:or-HourlyHistoricalPrice">
<hhp:indexId>1025127</hhp:indexId>
<hhp:resetDate>20160101T000000</hhp:resetDate>
<hhp:refSource>AIBO</hhp:refSource>
<hhp:indexLocation/>
<hhp:price1>51,870000</hhp:price1>
...
<hhp:price48>49,910000</hhp:price48>
</hhp:HourlyHistoricalPrice>
<hhp:HourlyHistoricalPrice xmlns:hhp="urn:or-HourlyHistoricalPrice">
<hhp:indexId>1025127</hhp:indexId>
<hhp:resetDate>20163112T000000</hhp:resetDate>
<hhp:refSource>APX</hhp:refSource>
<hhp:indexLocation/>
<hhp:price1>63,870000</hhp:price1>
...
<hhp:price48>29,910000</hhp:price48>
</hhp:HourlyHistoricalPrice>
</hhp:HourlyHistoricalPrices>
I want to retrieve only the hhp:HourlyHistoricalPrice nodes that have a particular value for hhp:refSource, e.g. AIBO.
I was trying the below XPathExpression but this retrieves nothing.
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
String strExprssion =
"/hhp:HourlyHistoricalPrices/hhp:HourlyHistoricalPrice[hhp:refSource='AIBO']";
XPathExpression expression = xpath.compile(strExprssion);
NodeList nodes = (NodeList) expression.evaluate(originalXmlDoc, XPathConstants.NODESET);
System.out.println(nodes.getLength());
I would be grateful if somebody could provide advise on the correct expression to use.
Thanks a lot.
You need to expand the prefix into the xml namespace it represents:
String strExprssion = "//urn:or-HourlyHistoricalPrice:HourlyHistoricalPrice[urn:or-HourlyHistoricalPrice:refSource='AIBO']";
So for me, this test class
public class XPathCheck {
public static void main(String[] args) throws FileNotFoundException, IOException, XPathExpressionException {
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
try (InputStream file = new FileInputStream(Paths.get("src", "inputFile.xml").toFile())) {
String strExprssion = "//urn:or-HourlyHistoricalPrice:HourlyHistoricalPrice[urn:or-HourlyHistoricalPrice:refSource='AIBO']";
XPathExpression expression = xpath.compile(strExprssion);
NodeList nodes = (NodeList) expression.evaluate(new InputSource(file), XPathConstants.NODESET);
System.out.println(nodes.getLength());
}
}
}
outputs "2".

How to store XML into Java array

I have an XML with a list of beaches.
Each entry looks like:
<beach>
<name>Dor</name>
<longitude>32.1867</longitude>
<latitude>34.6077</latitude>
</beach>
I am using Jsoup to read this XML into a Document doc.
Is there an easy way to handle this data?
I like to be able to do something like this:
x = my_beach_list["Dor"].longitude;
Currently I left it in Jsoup Doc and I am using:
x = get_XML_val(doc, "Dor", "longitude");
With get_XML_val defined as:
private String get_XML_val(Document doc, String element_name, String element_attr) {
Elements beaches = doc.select("beach");
Elements one_node = beaches.select("beach:matches(" + element_name + ")");
Element node_attr = one_node.select(element_attr).first();
String t = node_attr.text();
return t;
}
Thanks
Ori
Java is an object oriented language, and you would benefit a lot by using that fact, for instance, you can parse your XML into a java data structure once, and use that as a lookup, ex:
class Beach {
private String name;
private double longitude;
private double latitude;
// constructor;
// getters and setters;
}
now when you have your class set up, you can start to parse your xml into a list of beach objects:
List<Beach> beaches = new ArrayList<>();
// for every beach object in the xml
beaches.add(new Beach(val_from_xml, ..., ...);
now when you want to find a specific object, you can query your collection.
beaches.stream().filter(beach -> "Dor".equals(beach.getName()));
You can use XPath evaluator to execute queries on the xml directly and can get the results. For Example : the xpath expression to get longitude value for a beach with name Dor is : /beach[name="Dor"]/longitude
The corresponding java code to evaluate xpath expressions can be written as :
private static void getValue(String xml) {
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new StringBufferInputStream(xml));
doc.getDocumentElement().normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.compile("/beach[name=\"Dor\"]/longitude").evaluate(doc,
XPathConstants.NODESET);
System.out.println(nodeList.item(0).getTextContent());
} catch (IOException | ParserConfigurationException | XPathExpressionException | SAXException exception) {
exception.printStackTrace();
}
}
The above method prints the longitude value as : 32.1867
More on Xpaths here.

Looping over nodes and extracting specific subnode values using Java's XPath

I understand from Googling that it makes more sense to extract data from XML using XPath than by using DOM looping.
At the moment, I have implemented a solution using DOM, but the code is verbose, and it feels untidy and unmaintainable, so I would like to switch to a cleaner XPath solution.
Let's say I have this structure:
<products>
<product>
<title>Some title 1</title>
<image>Some image 1</image>
</product>
<product>
<title>Some title 2</title>
<image>Some image 2</image>
</product>
...
</products>
I want to be able to run a for loop for each of the <product> elements, and inside this for loop, extract the title and image node values.
My code looks like this:
InputStream is = conn.getInputStream();
DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(is);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/products/product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList products = (NodeList) result;
for (int i = 0; i < products.getLength(); i++) {
Node n = products.item(i);
if (n != null && n.getNodeType() == Node.ELEMENT_NODE) {
Element product = (Element) n;
// do some DOM navigation to get the title and image
}
}
Inside my for loop I get each <product> as a Node, which is cast to an Element.
Can I simply use my instance of XPathExpression to compile and run another XPath on the Node or the Element?
Yes, you can always do like this -
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/products/product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
expr = xpath.compile("title"); // The new xpath expression to find 'title' within 'product'.
NodeList products = (NodeList) result;
for (int i = 0; i < products.getLength(); i++) {
Node n = products.item(i);
if (n != null && n.getNodeType() == Node.ELEMENT_NODE) {
Element product = (Element) n;
NodeList nodes = (NodeList) expr.evaluate(product,XPathConstants.NODESET); //Find the 'title' in the 'product'
System.out.println("TITLE: " + nodes.item(0).getTextContent()); // And here is the title
}
}
Here I have given example of extracting the 'title' value. In same way you can do for 'image'
I'm not a big fan of this approach because you have to build a document (which might be expensive) before you can apply XPaths to it.
I've found VTD-XML a lot more efficient when it comes to applying XPaths to documents, because you don't need to load the whole document into memory. Here is some sample code:
final VTDGen vg = new VTDGen();
vg.parseFile("file.xml", false);
final VTDNav vn = vg.getNav();
final AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/products/product");
while (ap.evalXPath() != -1) {
System.out.println("PRODUCT:");
// you could either apply another xpath or simply get the first child
if (vn.toElement(VTDNav.FIRST_CHILD, "title")) {
int val = vn.getText();
if (val != -1) {
System.out.println("Title: " + vn.toNormalizedString(val));
}
vn.toElement(VTDNav.PARENT);
}
if (vn.toElement(VTDNav.FIRST_CHILD, "image")) {
int val = vn.getText();
if (val != -1) {
System.out.println("Image: " + vn.toNormalizedString(val));
}
vn.toElement(VTDNav.PARENT);
}
}
Also see this post on Faster XPaths with VTD-XML.

How to remove elements of a page in htmlunit

Normally in PHP, I would just parse the old document and write to the new document while ignoring the unwanted elements.
This was the first solution I came up with:
DocumentBuilder builder = DocumentBuilderFactory
.newInstance()
.newDocumentBuilder();
StringReader reader = new StringReader( xml );
Document document = builder.parse( new InputSource(reader) );
XPathExpression expr = XPathFactory
.newInstance()
.newXPath()
.compile( ... );
Object result = expr.evaluate(document, XPathConstants.NODESET);
Element el = document.getDocumentElement();
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
el.removeChild( nodes.item(i) );
}
As you can see it's kinda long. Being a coder who strives for simplicity, I decided to take Ahmed's advice hoping I'll find a better solution and I came up with this:
List<?> elements = page.getByXPath( ... );
DomNode node = null;
for( Object o : elements ) {
node = (DomNode)o;
node.getParentNode().removeChild( node );
}
Please note these are just snippets, I omitted the imports and the XPath expressions but you get the idea.
Have a look at the DOM methods, you can remove nodes.
http://htmlunit.sourceforge.net/apidocs/com/gargoylesoftware/htmlunit/html/DomNode.html

Categories