namespace-unaware XPath expression fails if Saxon is on the CLASSPATH

namespace-unaware XPath expression fails if Saxon is on the CLASSPATH - java

I have the following sample XML file:
<a xmlns="http://www.foo.com">
<b>
</b>
</a>
Using the XPath expression /foo:a/foo:b (with 'foo' properly configured in the NamespaceContext) I can correctly count the number of b nodes and the code works both when Saxon-HE-9.4.jar is on the CLASSPATH and when it's not.
When, however, I parse the same file with a namespace-unaware DocumentBuilderFactory, the XPath expression "/a/b" correctly counts the number of b nodes only when Saxon-HE-9.4.jar is not on the CLASSPATH.
Code below:
import java.io.*;
import java.util.*;
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import javax.xml.namespace.NamespaceContext;
public class FooMain {
public static void main(String args[]) throws Exception {
String xmlSample = "<a xmlns=\"http://www.foo.com\"><b></b></a>";
{
XPath xpath = namespaceUnawareXpath();
System.out.printf("[NS-unaware] Number of 'b' nodes is: %d\n",
((NodeList) xpath.compile("/a/b").evaluate(stringToXML(xmlSample, false),
XPathConstants.NODESET)).getLength());
}
{
XPath xpath = namespaceAwareXpath("foo", "http://www.foo.com");
System.out.printf("[NS-aware ] Number of 'b' nodes is: %d\n",
((NodeList) xpath.compile("/foo:a/foo:b").evaluate(stringToXML(xmlSample, true),
XPathConstants.NODESET)).getLength());
}
}
public static XPath namespaceUnawareXpath() {
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
return xpath;
}
public static XPath namespaceAwareXpath(final String prefix, final String nsURI) {
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
NamespaceContext ctx = new NamespaceContext() {
#Override
public String getNamespaceURI(String aPrefix) {
if (aPrefix.equals(prefix))
return nsURI;
else
return null;
}
#Override
public Iterator getPrefixes(String val) {
throw new UnsupportedOperationException();
}
#Override
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
};
xpath.setNamespaceContext(ctx);
return xpath;
}
private static Document stringToXML(String s, boolean nsAware) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(nsAware);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new ByteArrayInputStream(s.getBytes("UTF-8")));
}
}
Running the above with:
java -classpath dist/foo.jar FooMain
.. produces:
[NS-unaware] Number of 'b' nodes is: 1
[NS-aware ] Number of 'b' nodes is: 1
Running with:
java -classpath Saxon-HE-9.4.jar:dist/foo.jar FooMain
... produces:
[NS-unaware] Number of 'b' nodes is: 0
[NS-aware ] Number of 'b' nodes is: 1

Correct observation. Saxon doesn't work with a namespace-unaware DOM. There's no reason why it should. If you can find an XSLT/XPath processor that works with a namespace-unaware DOM, then go ahead and use it if you want, but its behaviour isn't defined by any standard.
If it were possible for Saxon to detect that the DOM is namespace-unaware, then it would throw an error rather than giving spurious results. Sadly, one of DOM's many design failings is that if you didn't create the DOM yourself, you can't tell whether it's namespace-aware or not.
Your comment "I need to be lenient on namespaces since I have to handle 3rd-party XML instances that are not always XSD valid." is a complete non-sequitur. It's true that a document can't be XSD-valid unless it is namespace-valid, but the converse is not true; loads of documents are namespace-valid without being XSD-valid.
Finally, as your experience shows, relying on the JAXP mechanism to load whatever XPath processor happens to be lying around on the classpath is very error-prone. You can't even control whether you get an XPath 1.0 or 2.0 processor by this mechanism (and again, you can't find out easily which you have got). If your code is dependent on the quirks of a particular XPath implementation then you need to load that implementation explicitly rather than relying on the JAXP search.
UPDATE (Sep 2015): Saxon 9.6 no longer includes the meta-inf services file that advertises it as a JAXP XPath provider. This means you will never pick up Saxon as your XPath processor simply because it is on the classpath: you have to ask for it explicitly.

The XPath language is only defined on namespace-well-formed XML, so the behaviour of different processors on a non-namespace-aware DOM tree (even one like <a><b/></a> that, had it been parsed in a namespace-aware manner, would not actually use any namespaces) is at best implementation-specific and at worst completely undefined.

Saxon 10 now supports XPaths without namespaces, you can configure it like this:
XPath xPath = new net.sf.saxon.xpath.XPathFactoryImpl().newXPath();
((XPathEvaluator)xPath).getStaticContext().setUnprefixedElementMatchingPolicy(UnprefixedElementMatchingPolicy.ANY_NAMESPACE);

Related

Navigating XML using XPATH with a different namespace

I am struggling to find out how to navigate in to the area of the xml that uses namespaces. Using basic xpath i can navigate to the message detail node fine, but I am not sure what I need to do in terms of getting in to that block as everything inside uses namespaces. Please could someone help?
Thanks
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<MessageList>
<MessageCount>2</MessageCount>
<DateTimeStamp>2016-02-11T12:50:26</DateTimeStamp>
<MessageDetail>
<MessageID>2332445456767</MessageID>
<Env:MessageContainer xmlns:Env="http://www.somesite.com/schema/v1.0/envelope" xmlns:BS="http://www.somesite.com/schema/v1.0/BusinessServices">
<Env:MessageParties>
public List<String> getRefs(String xmlMessageToSend)
{
try
{
Document doc = createDocument(xmlMessageToSend.getBytes());
XPath xpath = xPathFactory.newXPath();
xpath.setNamespaceContext(new NamespaceContext() {
#Override
public String getNamespaceURI(String prefix)
{
if (prefix == null)
throw new NullPointerException("Null prefix");
else if ("Env".equals(prefix))
return "http://www.om.com/schema/v1.0/envelope";
else if ("BS".equals(prefix))
return "http://www.o.com/schema/v1.0/BusinessServices";
return XMLConstants.NULL_NS_URI;
}
#Override
public Iterator getPrefixes(String namespaceURI)
{
throw new UnsupportedOperationException();
}
#Override
public String getPrefix(String namespaceURI)
{
throw new UnsupportedOperationException();
}
});
XPathExpression exp = xpath
.compile("/Message/MessageList/MessageDetail/Env:MessageContainer");
Node result = (Node)exp.evaluate(doc, XPathConstants.NODE);
System.out.println(result.getTextContent());
}
catch (XPathExpressionException | SAXException | IOException | ParserConfigurationException e)
{
e.printStackTrace();
}
return new ArrayList<String>();
}

You don't say what you're using to navigate the document, but generally, there should be a way in the API of whatever you're using for you to declare a namespace prefix that matches the one on Env:MessageContainer. Then you can use that prefix in your XPath, e.g. //e:MessageContainer (assuming you mapped 'e' to "http://www.somesite.com/schema/v1.0/envelope").

You need to use a message prefix in the XPath expression. For example,
//foo:MessageContainer
This prefix need not be the same prefix used for the namespace URI as in the original document. Here I've used the prefix foo even though it was Env in your document. As long as both prefixes map to the same URI (http://www.somesite.com/schema/v1.0/envelope in this example) the XPath will match.
How exactly you bind this prefix to the desired namespace URI varies depending on the host language in which the XPath expression is embedded. In XSLT, for example, you simply declare the relevant prefixes in the XSLT stylesheet, much as you would in any other XML document. In XOM, by contrast, you have to supply an XPathContext object that maps the namespace prefixes accordingly. And so on for other languages and APIs.

You can access your elements via e.g.
//Env:MessageContainer
But to achieve this, your xmlns:Env="http://www.somesite.com/schema/v1.0/envelope" xmlns:BS="http://www.somesite.com/schema/v1.0/BusinessServices" should be defined in the <root> element instead of (or in addition to) the <Env:MessageContainer> element.
But if you cannot change your source XML, the correct solution would be the comprehensive style of writing the same thing as above:
//*[local-name()='MessageContainer'][namespace-uri()='http://www.somesite.com/schema/v1.0/envelope']

Get all the tags and values from XML using SAX parser in java

I am trying to parse xml using SAX. I want all the tags and their values from xml in nested way. Is it possible with SAX parser. Can anyone provide me an example. (I think SAX is efficient than w3 document builder, So I chose it. And I want to know weather I'm on the right path)
I'm attaching my java program
class MySAXApp extends DefaultHandler
{
public MySAXApp ()
{
super();
}
public void startDocument ()
{
System.out.println("Start document");
}
public void endDocument ()
{
System.out.println("End document");
}
public void startElement (String uri, String name,
String qName, Attributes atts)
{
System.out.println(atts.getLength());
if ("".equals (uri))
System.out.println("Start element: " + qName);
else
System.out.println("Start element: {" + uri + "}" + name);
}
}
Here is my XML.
Is this a valid xml? Are there any errors in writing xml like this
<?xml version="1.0" encoding="utf-8"?>
<CustomerReport xsi:schemaLocation="Customer.xsd">
<Customer>
<CustomerName>str1234</CustomerName>
<CustomerStatus>str1234</CustomerStatus>
<PurchaceOrders>
<PurchaceOrder>
<PurchaceOrderName>str1234</PurchaceOrderName>
</PurchaceOrder>
</PurchaceOrders>
</Customer>
</CustomerReport>
I'm new to XML. Can someone help me on this

When you say SAX is "more efficient", what you actually mean is that a SAX parser does the minimum amount of work, leaving most of the work to the application. That means you (the application writer) have more code to write, and it's quite tricky code as you are discovering. Because the people who write XML parsers are much more experienced Java coders than you are, it's likely that the more work you do in your code, and the less you do within the parser, the less efficient your overall application will be. So given your level of experience, my advice would be to use a parsing approach where the library does as much as possible of the work. I would suggest using JDOM2.

The only attribute you have in the XML you posted is for the attribute with the xsi prefix. For the rest the attribute length should be 0.
Attributes are key-value pairs inside a tag. Most of your xml content is inside of elements.
The efficiency advantage of SAX (or STAX) over something like JDOM is due to the sax parser not maintaining all the data it reads in memory. If you use the contentHandler to retrieve data and save it as it gets read then your program doesn't have to consume that much memory.
Read this tutorial or this Javaworld article. You need to implement a characters method in order to get any element text. Both linked articles have good examples of how to implement your characters method so that you can retrieve element text.
There are a lot of bad examples for this that you are likely to find if you google around (bad example) or search on stackoverflow (bad example here), but the example implementations in the linked articles are correct, because they buffer the output from the characters method until all characters have been found:
Parsers are not required to return any particular number of characters at one time. A parser can return anything from a single character at a time up to several thousand and still be a standard-conforming implementation. So if your application needs to process the characters it sees, it is wise to have the characters() method accumulate the characters in a java.lang.StringBuffer and operate on them only when you are sure that all of them have been found.
Here is the ContentHandler from the JavaWorld article's hello world example changed to use your xml:
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
public class Example2 extends DefaultHandler {
// Local variables to store data
// found in the XML document
public String name = "";
public String status = "";
public String orderName = ""
// Buffer for collecting data from // the "characters" SAX event.
private CharArrayWriter contents = new CharArrayWriter();
// Override methods of the DefaultHandler class
// to gain notification of SAX Events.
//
// See org.xml.sax.ContentHandler for all available events.
//
public void startElement( String namespaceURI,
String localName,
String qName,
Attributes attr ) throws SAXException {
contents.reset();
}
public void endElement( String namespaceURI,
String localName,
String qName ) throws SAXException {
if ( localName.equals( "CustomerName" ) ) {
name = contents.toString();
}
if ( localName.equals( "CustomerStatus" ) ) {
status = contents.toString();
}
if (localName.equals("PurchaceOrderName")) {
orderName = contents.toString();
}
}
public void characters( char[] ch, int start, int length )
throws SAXException {
contents.write( ch, start, length );
}
}

Unmarshalling XML to an Iterable<Long> using JAXB

I have some very simple XML that I wish to unmarshall. I'm only interested in the values for one of the elements which repeats. Here's some sample XML:
<Document>
<HeaderGuff>Whatever</HeaderGuff>
<Foos>
<FooId>1</FooId>
<FooId>2</FooId>
</Foos>
</Document>
I would like to use JAXB to allow me to iterate of the FooId's as a Long.
The usual examples require creating a data class with setFooId and getFooId methods. Is there a way to unmarshall directly to Long such that I can do this:
for ( Long fooId : <something JAXB> )
I do not want to load all the identifiers into memory at once as there are potentially many of them, and they are only needed one at a time for individual processing.

Since you are only interested in one of the elements and don't want to bring all the data into memory at once I would use a StAX parser (included in the JDK/JRE since Java SE 6) instead of JAXB for this use case.
You would then advance your XMLStreamReader to the FooId element, process it and then advance it to the next element.
import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newFactory();
StreamSource source = new StreamSource("input.xml");
XMLStreamReader xsr = xif.createXMLStreamReader(source);
while(xsr.hasNext()) {
if(xsr.isStartElement() && "FooId".equals(xsr.getLocalName())) {
long value = Long.valueOf(xsr.getElementText());
System.out.println(value);
}
xsr.next();
}
xsr.close();
}
}

How to replace a value in xpath expression [duplicate]

I have an XPath expression that searches for a static value. In this example, "test" is that value:
XPathExpression expr = xpath.compile("//doc[contains(., 'test')]/*/text()");
How can I pass a variable instead of a fixed string? I use Java with Eclipse. Is there a way to use the value of a Java String to declare an XPath variable?

You can define a variable resolver and have the evaluation of the expression resolve variables such as $myvar, for example:
XPathExpression expr = xpath.compile("//doc[contains(., $myVar)]/*/text()");
There's a fairly good explanation here. I haven't actually done this before myself, so I might have a go and provide a more complete example.
Update:
Given this a go, works a treat. For an example of a very simple implementation, you could define a class that returns the value for a given variable from a map, like this:
class MapVariableResolver implements XPathVariableResolver {
// local store of variable name -> variable value mappings
Map<String, String> variableMappings = new HashMap<String, String>();
// a way of setting new variable mappings
public void setVariable(String key, String value) {
variableMappings.put(key, value);
}
// override this method in XPathVariableResolver to
// be used during evaluation of the XPath expression
#Override
public Object resolveVariable(QName varName) {
// if using namespaces, there's more to do here
String key = varName.getLocalPart();
return variableMappings.get(key);
}
}
Now, declare and initialise an instance of this resolver in the program, for example
MapVariableResolver vr = new MapVariableResolver() ;
vr.setVariable("myVar", "text");
...
XPath xpath = factory.newXPath();
xpath.setXPathVariableResolver(vr);
Then, during evaluation of the XPath expression XPathExpression expr = xpath.compile("//doc[contains(., $myVar)]/*/text()");, the variable $myVar will be replaced with the string text.
Nice question, I learnt something useful myself!

You don't need to evaluate Java (or whatever else PL variables in XPath). In C# (don't know Java well) I'll use:
string XPathExpression =
"//doc[contains(., " + myVar.ToString() + ")]/*/text()";
XmlNodelist result = xmlDoc.SelectNodes(XPathExpression);

Apart from this answer here, that explains well how to do it with the standard Java API, you could also use a third-party library like jOOX that can handle variables in a simple way:
List<String> list = $(doc).xpath("//doc[contains(., $1)]/*", "test").texts();

I use something similar to #brabster:
// expression: "/message/PINConfiguration/pinValue[../keyReference=$keyReference]";
Optional<Node> getNode(String xpathExpression, Map<String, String> variablesMap)
throws XPathExpressionException {
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setXPathVariableResolver(qname -> variablesMap.get(qname.getLocalPart()));
return Optional.ofNullable((Node) xpath.evaluate(xpathExpression, document,
XPathConstants.NODE));
}
Optional<Node> getNode(String xpathExpression) throws XPathExpressionException {
return getNode(xpathExpression, Collections.emptyMap());
}

Find elements in a Node without the proper namespace, in Java

So I have a xml doc that I've declared here:
DocumentBuilder dBuilder = dbFactory_.newDocumentBuilder();
StringReader reader = new StringReader(s);
InputSource inputSource = new InputSource(reader);
doc_ = dBuilder.parse(inputSource);
Then I have a function where I pass in a string and I want to match that to an element in my xml:
void foo(String str)
{
NodeList nodelist = doc_.getDocumentElement().getElementsByTagName(str);
}
The problem is when the str comes in it doesn't have any sort of namespace in it so the xml that I would be testing would be:
<Random>
<tns:node />
</Random>
and the str will be node. So nodelist is now null because its expecting tns:node but I passed in node. And I know its not good to ignore the namespace but in this instance its fine. My problem is that I don't know how to search the Node for an element while ignoring the namespace. I also thought about adding the namespace to the str that comes in but I have no idea how to do that either.
Any help would be greatly appreciated,
Thanks, -Josh

In order to match all nodes whose name is 'str' regardless of namespace use the following:
NodeList nodes = doc.getDocumentElement().getElementsByTagNameNS("*", str);
The wildcard "*" will match any namespace. See Element.getElementsByTagNameNS(...).
Edit: in addition, how #Wheezil correctly stated in a comment, you have to call DocumentBuilderFactory.setNamespaceAware(true) for this to work, otherwise namespaces will not be detected.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.