I'm trying to use an XPathFactory to evaluate an expression in a Java application. But I'm getting a Saxon-specific error. At one time I used Saxon for some functionality, and to do that I had to set a system property:
System.setProperty("javax.xml.xpath.XPathFactory:" + NamespaceConstant.OBJECT_MODEL_SAXON,
"net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory xpf = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
However, now I just want to do some XML processing using the default DOM (org.w3c.dom.Document) and process with xpath, so Saxon isn't needed. But when I try to create an XPathFactory I still get the Saxon error message:
Exception in thread "AWT-EventQueue-0" java.lang.NoClassDefFoundError: net/sf/saxon/lib/EnvironmentVariableResolver
at net.sf.saxon.xpath.XPathFactoryImpl.<init>(XPathFactoryImpl.java:26)
...
I even tried "resetting" the system property:
System.setProperty("javax.xml.xpath.XPathFactory:",
"org.apache.xpath.jaxp.XPathFactoryImpl");
XPathFactory factory = XPathFactory.newInstance();
And
System.setProperty("javax.xml.xpath.XPathFactory:",
"http://java.sun.com/jaxp/xpath/dom");
XPathFactory factory = XPathFactory.newInstance();
But that doesn't help, I still get the same error message.
How do I get rid of this in order to use the default XPathFactory again? (this has worked fine before I tried using Saxon)
As a workaround, you can explicitly instanciate the JDK factory (or Xerces's or Saxon's).
import org.apache.xpath.jaxp.XPathFactoryImpl
// import com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl
// import net.sf.saxon.xpath.XPathFactoryImpl
...
XPathFactory factory = new XPathFactoryImpl();
If possible, prefer the real Xerces implementation to the one found in the JDK. It is more reliable.
I have encounter the same question. Even no "System.setProperty" is called, jaxp will load saxon's xpath engine as default implementation provding saxon jar is on the classpath. Reference: namespace-unaware XPath expression fails if Saxon is on the CLASSPATH.
My solution: call saxon directly as: " XPathFactory _xFactory = new net.sf.saxon.xpath.XPathFactoryImpl();" and add jaxen-xxx.jar and xercesImpl.jar before saxon9e.jar on the classpath. Everything else remains its original state without call "System.setProperty". This works for me.
I also test another method as follows:
System.setProperty("javax.xml.xpath.XPathFactory:" +XPathConstants.DOM_OBJECT_MODEL, "net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory xFactory = XPathFactory.newInstance(XPathConstants.DOM_OBJECT_MODEL);
System.setProperty(XPathFactory.DEFAULT_PROPERTY_NAME +":" + XPathFactory.DEFAULT_OBJECT_MODEL_URI, " org.apache.xpath.jaxp.XPathFactoryImpl");
XPathFactory xFactory2 = XPathFactory.newInstance();
System.out.println(xFactory.toString());
System.out.println(xFactory2.toString());
The output:
net.sf.saxon.xpath.XPathFactoryImpl#71623278
com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl#768b970c
Since Jaxp use apache's jaxen as its default xpath implementation, this method should work tool.
Since JAXP uses
Related
I am trying to query an XML file using miscellaneous xpaths with Saxonica API from net.sf.saxon but it seems that every time the query operations return results without xml tags - only the content. Is there a way to do this (straight-forward or work-around)?
To be more explicit:
For the xml file
<books>
<book lang="en">
<nrpages>140</nrpages>
<author>J.R.R.Tolkien</author>
</book>
</books>
and the xpath
//book
I would like to retrieve
<book lang="en">
<nrpages>140</nrpages>
<author>J.R.R.Tolkien</author>
</book>
instead of
140
J.R.R.Tolkien
What I've tried:
XPathFactory factory = new XPathFactoryImpl();
XPathExpression compiledXPath = factory.newXPath().compile(xPathExpression);
TinyNodeImpl nodeItem = (TinyNodeImpl) compiledXPath.evaluate(new InputSource(filename), XPathConstants.NODE);
nodeItem.atomize(); // brings only the content
nodeItem.getStrinValue(); // brings only the content
The XPath expression returns a node; what you do with the node is then up to the calling application code. If you call node.getStringValue(), you will get the string value as defined in the XPath spec (that is, the same as calling fn:string() on the node within XPath). Similarly, the atomize() method follows the XPath spec for atomization (equivalent to fn:data() applied to the node.)
If you want the node to be serialized as lexical XML, there are various ways of achieving it. If you were to use Saxon's s9api interface instead of the JAXP interface, I would recommend XdmNode.toString(). Using the JAXP interface and then casting to internal Saxon classes gives you the worst of both worlds: you get all the problems of JAXP (e.g. weak typing, no XPath 2.0 support) with none of the benefits (portability across implementations). But if you prefer to do it this way, then the simplest way to serialize Saxon nodes is probably the static method QueryResult.serialize(NodeInfo). The 3-argument version of the method gives you full control over serialization properties such as indentation and adding an XML declaration.
With XPath 3.1 you can also invoke serialization within the XPath expression itself by calling fn:serialize(); this would avoid having to use any Saxon-specific classes and methods in the Java code.
I am trying to run some XPath Queries on XML in Java and apparently the recommended way to do so is to construct a document first.
Here is the standard JAXP code sample that I was using:
import org.w3c.dom.Document;
import javax.xml.parsers.*;
final DocumentBuilder xmlParser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
final Document doc = xmlParser.parse(xmlFile);
I also tried the Saxon API, but got the same errors:
import net.sf.saxon.s9api.*;
final DocumentBuilder documentBuilder = new Processor(false).newDocumentBuilder();
final XdmNode xdm = documentBuilder.build(new File("out/data/blog.xml"));
Here is a minimal reconstructed example XML which the DocumentBuilder in JDK 1.8 can't parse:
<?xml version="1.1" encoding="UTF-8" ?>
<xml>
<![CDATA[Some example text with [funny highlight]]]>
</xml>
According to the spec, the square bracket ] just before the end of CDATA marker ]]> is perfectly legal, but the parser just exits with a stack trace and the message org.xml.sax.SAXParseException; XML document structures must start and end within the same entity..
On my original data file which contains a lot of CDATA sections, the message is instead org.xml.sax.SAXParseException; The element type "item" must be terminated by the matching end-tag "</item>". In both cases ´com.sun.org.apache.xerces´ shows up in the stacktrace a lot.
Form both observations it seems as if the parser just didn't end the CDATA section at ]]>.
EDIT: As it turned out, the example will pass when the <?xml ... ?> declaration is omitted. I hadn't checked that before posting here and added it just now.
Short answer: add Apache Xerces to the build path, it will automatically be loaded instead of the parser from the JDK and the XML will be parsed just fine! Copy-paste Gradle Dependency:
implementation "xerces:xercesImpl:2.11.0"
Some background: Apache Xerces is indeed the same parser which is also used in the JDK, but even though Xerces 2.11 dates from 2013 the JDK comes with a much older version. That really sucks!
As the Saxon team puts it:
Saxonica recommends use of the Xerces parser from Apache in preference to the version bundled in the JDK, which is known to have some serious bugs.
In case you wonder how simply putting Xerces on the classpath makes the problem disappear: even though the JDK and Saxon DocumentBuilders construct entirely different document types, they both use the same Standard Java Interfaces to call the parser and also the same mechanism to find and load the parser (or rather, the parser factory). In short, a java.util.ServiceLoader is called and looks into all the JARs in the classpath for properties files in META-INF/services and this is how the xercesJar announces that it does provide an XML parser. And good for us, the JDK's own implementation is superseded by anything found there.
After making this bad experience with JDK XML classes, I am even more motivated to refactor projects to use Saxon for XPath processing instead of the implementation of XPath in the JDK. The other reason is the technical advantage of XDM over DOM (same link as above).
In Java how do I evaluate XPATH expression on XML using SAX Parser?
Need more dynamic way because the XML format is not fixed. So i should be able pass the following
xpath as string
xml as string / input source
Something like Utility.evaluate("/test/#id='123'", "")
Here is an exemple :
//First create a Document
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new File("test.xml"));
//Init the xpath factory
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/company/employee";
//read a nodelist using xpath
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(doc, XPathConstants.NODESET);
EDIT :
If you want to use a SAX parser, then you can't use the XPath object of Java, see https://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/package-summary.html
The XPath language provides a simple, concise syntax for selecting nodes from an XML document. XPath also provides rules for converting a node in an XML document object model (DOM) tree to a boolean, double, or string value. XPath is a W3C-defined language and an official W3C recommendation; the W3C hosts the XML Path Language (XPath) Version 1.0 specification.
XPath started in life in 1999 as a supplement to the XSLT and XPointer languages, but has more recently become popular as a stand-alone language, as a single XPath expression can be used to replace many lines of DOM API code.
If you want to use SAX you can look at libs detailed in this question : Is there any XPath processor for SAX model? .
Although the mechanic of XPath does not really suit SAX. Indeed using a SAX parser won't create an XML tree in memory. Hence you can't use XPath efficiently because it won't see not loaded nodes.
Only a small subset of XPath is amenable to streamed evaluation, that is, evaluation on-the-fly while parsing the input document. There are therefore not many streaming XPath processor around; most of them are the product of academic research projects.
One thing you could try is Saxon-EE streamed XQuery. This is a small subset of XQuery that allows streamed executaion (it will allow expressions like your example). Details at
http://www.saxonica.com/documentation/#!sourcedocs/streaming/streamed-query
Oracle's XQuery processor for Java will "dynamically" stream path expressions:
https://docs.oracle.com/database/121/ADXDK/adx_j_xqj.htm#ADXDK99930
Specifically, there is information on streaming here, including an example:
https://docs.oracle.com/database/121/ADXDK/adx_j_xqj.htm#ADXDK119
But it will not stream using SAX. You must bind the input XML as either StAX, InputStream, or Reader to get streaming evaluation.
You can use a SAXSource with XPath using Saxon, but - and this is important - be aware that the underlying implementation will almost certainly still be loading and buffering some or all of the document in memory in order to evaluate the xpath. It probably won't be a full DOM tree (Saxon relies on its own structure called TinyTree, which supports lazy-loading and various other optimizations), so it's better than using most DOM implementations, but it still involves loading the document into memory. If your concern is memory load for large data sets, it probably won't help you much, and you'd be better off using one of the streaming xpath/xquery options suggested by others.
An implementation of your utility method might look something like this:
import java.io.StringReader;
import javax.xml.namespace.QName;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.sax.SAXSource;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import org.xml.sax.InputSource;
import net.sf.saxon.xpath.XPathFactoryImpl;
public class XPathUtils {
public static Object evaluate(String xpath, String xml, QName returnType)
throws Exception {
SAXParser parser = (SAXParser) SAXParserFactory.newInstance()
.newSAXParser();
InputSource source = new InputSource(new StringReader(xml));
SAXSource saxSource = new SAXSource(parser.getXMLReader(), source);
XPath xPath = new XPathFactoryImpl().newXPath();
return xPath.evaluate(xpath, saxSource, returnType);
}
public static String xpathString(String xpath, String xml)
throws Exception {
return (String) evaluate(xpath, xml, XPathConstants.STRING);
}
public static boolean xpathBool(String xpath, String xml) throws Exception {
return (Boolean) evaluate(xpath, xml, XPathConstants.BOOLEAN);
}
public static Number xpathNumber(String xpath, String xml) throws Exception {
return (Number) evaluate(xpath, xml, XPathConstants.NUMBER);
}
public static void main(String[] args) throws Exception {
System.out.println(xpathString("/root/#id", "<root id='12345'/>"));
}
}
This works because the Saxon XPath implementation supports SAXSource as a context for evaluate(). Be aware that trying this with the built-in Apaache XPath implementation will throw an exception.
I would like to extract an element called ServiceGroupID from the SOAP header, which specifies the session of the transaction. I would need this so that I could direct the request to the same server using SOAP session. My XML is as follow:
<?xml version="1.0" encoding="http://schemas.xmlsoap.org/soap/envelope/" standalone="no"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Header xmlns:wsa="http://www.w3.org/2005/08/addressing">
<wsa:ReplyTo>
<wsa:Address>http://www.w3.org/2005/08/addressing/none</wsa:Address>
<wsa:ReferenceParameters>
<axis2:ServiceGroupId xmlns:axis2="http://ws.apache.org/namespaces/axis2">urn:uuid:99A029EBBC70DBEB221347349722532</axis2:ServiceGroupId>
</wsa:ReferenceParameters>
</wsa:ReplyTo>
<wsa:MessageID>urn:uuid:99A029EBBC70DBEB221347349722564</wsa:MessageID>
<wsa:Action>Perform some action</wsa:Action>
<wsa:RelatesTo>urn:uuid:63AD67826AA44DAE8C1347349721356</wsa:RelatesTo>
</soapenv:Header>
I would like to know how I could extract the Session GroupId from the above XML using Xpath.
You haven't specified a technology, so assuming that you haven't set up the equivalent of a .NET NameSpace manager or similar, you can use namespace agnostic Xpath as follows:
/*[local-name()='Envelope']/*[local-name()='Header']
/*[local-name()='ReplyTo']/*[local-name()='ReferenceParameters']
/*[local-name()='ServiceGroupId']/text()
Edit Updated for Java
Without namespace aliases
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expression = xpath.compile("/*[local-name()='Envelope']/*[local-name()='Header']/*[local-name()='ReplyTo']/*[local-name()='ReferenceParameters']/*[local-name()='ServiceGroupId']/text()");
System.out.println(expression.evaluate(myXml));
With NamespaceContext
NamespaceContext context = new NamespaceContextMap(
"soapenv", "http://schemas.xmlsoap.org/soap/envelope/",
"wsa", "http://www.w3.org/2005/08/addressing",
"axis2", "http://ws.apache.org/namespaces/axis2");
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
xpath.setNamespaceContext(context);
XPathExpression expression = xpath.compile("/soapenv:Envelope/soapenv:Header/wsa:ReplyTo/wsa:ReferenceParameters/axis2:ServiceGroupId/text()");
System.out.println(expression.evaluate(myXml));
local-name() gives the tag name of the element agnostic of its namespace.
Also, the encoding in your above xml document doesn't look right.
Edit
Assuming that urn:uuid: is a constant, the following XPath will strip off the first 9 characters of the result (use with either of the above XPath). If urn:uuid isn't constant, then you'll need to tokenize / split etc, which is beyond my skills.
substring(string(/*[local-name()='Envelope']/*[local-name()='Header']
/*[local-name()='ReplyTo']/*[local-name()='ReferenceParameters']
/*[local-name()='ServiceGroupId']/text()), 10)
I am looking for some examples of using xpath in Android? Or if anyone can share their experiences. I have been struggeling to make tail or head of this problem :-(
I have a string that contains a standard xml file. I believe I need to convert that into an xml document. I have found this code which I think will do the trick:
public static Document stringToDom(String xmlSource)
throws SAXException, ParserConfigurationException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new InputSource(new StringReader(xmlSource)));
}
Next steps
Assuming the code above is OK, I need to apply xpath to get values from cat: "/animal/mammal/feline/cat"
I look at the dev doc here: http://developer.android.com/reference/javax/xml/xpath/XPath.html and also look online, but I am not sure where to start!
I have tried to use the following code:
XPathFactory xPathFactory = XPathFactory.newInstance();
// To get an instance of the XPathFactory object itself.
XPath xPath = xPathFactory.newXPath();
// Create an instance of XPath from the factory class.
String expression = "SomeXPathExpression";
XPathExpression xPathExpression = xPath.compile(expression);
// Compile the expression to get a XPathExpression object.
Object result = xPathExpression.evaluate(xmlDocument);
// Evaluate the expression against the XML Document to get the result.
But I get "Cannot be resolved". Eclipse doesn't seem to be able to fix this import. I tried manually entering:
javax.xml.xpath.XPath
But this did not work.
Does anyone know any good source code that I can utilise, for Android platform? 1.5
What version of the SDK are you using? XPath was introduced in SDK 8(2.2). If you aren't building against that version then the class doesn't exist.
Rpond is correct:
javax.xml.xpath is introduced since API level 8 (Android 2.2).
Android 1.5. is only API Level 3.
References
package javax.xml.xpath -- "Since: API Level 8"
Description -- with examples
Android API levels -- "Platform Version Android 1.5 - API Level 3"