Fetch node value containing escape characters using xpath - java

I am using Java XML API to fetch a node value for a given XPath.
Here is the code I am using to fetch the node value for a given XPath
final XPathFactory factory = XPathFactory.newInstance();
final XPath xpath1 = factory.newXPath();
xpath1.setNamespaceContext(new MyNamespaceContext());
InputSource inputSource = new InputSource(extractData(fileName));
inputSource.setEncoding("UTF-8");
String nodeValue = xpath1.evaluate(xpath, inputSource);
The xml file contains the node value as Some>data
The node value I am expecting is Some>data but the value that is returned from the above code is Some>data
Can any one help how to change the above code so that i get the node data as Some>data

Related

How to write XPath to get node attribute value from a "Name Space XML" in Java

INPUT_XML:
<?xml version="1.0" encoding="UTF-8">
<root xmlns:ns1="http://path1/schema1" xmlns:ns2="http://path2/schema2">
<ns1:abc>1234</ns1:abc>
<ns2:def>5678</ns2:def>
</root>
In Java, I am trying to write XPath expression which will get the value corresponding to this attribute "xmlns:ns1" from the above INPUT_XML string content.
I've tried the following:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(INPUT_XML);
String xpathExpression = "/root/xmlns:ns1";
// Create XPathFactory object
XPathFactory xpathFactory = XPathFactory.newInstance();
// Create XPath object
XPath xpath = xpathFactory.newXPath();
// Create XPathExpression object
XPathExpression expr = xpath.compile(xpathExpression);
// Evaluate expression result on XML document
NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
But the above code is not giving the expected value of the specified attribute i.e. xmlns:ns1. I heavily suspect the xPathExpression is wrong. Please suggest with the right XPath expression or the right approach to tackle this issue.
If you're using an XPath 1.0 processor, or a XPath 2.0 processor with XPath 1.0 compatibility mode turned on, you can use the namespace axis to select the namespace value.
You will need to make the following change in your code:
String xpathExpression = "/root/namespace::ns1"
The xmlns:ns1="http://path1/schema1" and xmlns:ns2="http://path2/schema2" are not attributes, but namespace declarations. You cannot retrieve them with an XPath declaration so easily (there is XPath function namespace-uri() for this purpose, but root element does not have any namespace, it only defines them for future use).
When using DOM API you could use method lookupNamespaceURI():
System.out.println("ns1 = " + doc.getDocumentElement().lookupNamespaceURI("ns1"));
System.out.println("ns2 = " + doc.getDocumentElement().lookupNamespaceURI("ns2"));
When using XPath you could try following expressions:
namespace-uri(/*[local-name()='root']/*[local-name()='abc'])
namespace-uri(/*[local-name()='root']/*[local-name()='def'])

Why does getLocalName() return null?

I'm loading some XML string like this:
Document doc = getDocumentBuilder().parse(new InputSource(new StringReader(xml)));
Later, I extract a node from this Document:
XPath xpath = getXPathFactory().newXPath();
XPathExpression expr = xpath.compile(expressionXPATH);
NodeList nodeList = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);
Node node = nodeList.item(0);
Now I want to get the local name of this node but I get null.
node.getLocalName(); // return null
With the debugger, I saw that my node has the following type: DOCUMENT_POSITION_DISCONNECTED.
The Javadoc states that getLocalName() returns null for this type of node.
Why node is of type DOCUMENT_POSITION_DISCONNECTED and not ELEMENT_NODE?
How to "convert" the type of the node?
As the documentation https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html#getLocalName() states:
for nodes created with a DOM Level 1 method, [...] this is always null
so make sure you use a namespace aware DocumentBuilderFactory with setNamespaceAware(true), that way the DOM is supporting the namespace aware DOM Level 2/3 and will have a non-null value for getLocalName().
A simple test program
String xml = "<root/>";
DocumentBuilderFactory db = DocumentBuilderFactory.newInstance();
Document dom1 = db.newDocumentBuilder().parse(new InputSource(new StringReader(xml)));
System.out.println(dom1.getDocumentElement().getLocalName() == null);
db.setNamespaceAware(true);
Document dom2 = db.newDocumentBuilder().parse(new InputSource(new StringReader(xml)));
System.out.println(dom2.getDocumentElement().getLocalName() == null);
outputs
true
false
so (at least) the local name problem you have is caused by using a DOM Level 1, not namespace aware document (builder factory).

Java xPath - extract subdocument from XML

I have an XML document as follows:
<DocumentWrapper>
<DocumentHeader>
...
</DocumentHeader>
<DocumentBody>
<Invoice>
<Buyer/>
<Seller/>
</Invoice>
</DocumentBody>
</DocumentWrapper>
I would like to extract from it the content of DocumentBody element as String, raw XML document:
<Invoice>
<Buyer/>
<Seller/>
</Invoice>
With xPath it could be simple to get by:
/DocumentWrapper/DocumentBody
Unfrotunatelly, my Java code doesn't want to work as I want. It returns empty lines instead of expected result. Is there any chance to do that, or I have to return NodeList and then genereate xml document from them?
My Java code:
XPathFactory xPathFactoryXPathFactory.newInstance();
XPath xPath xPathFactory.newXPath();
XPathExpression xPath.compile(xPathQuery);
String result = expression.evaluate(xmlDocument);
Calling this method
String result = expression.evaluate(xmlDocument);
is the same as calling this
String result = (String) expression.evaluate(xmlDocument, XPathConstants.STRING);
which returns the character data of the result node, or the character data of all child nodes in case the result node is an element.
You should probably do something like this:
Node result = (Node) expression.evaluate(xmlDocument, XPathConstants.NODE);
TransformerFactory.newInstance().newTransformer()
.transform(new DOMSource(result), new StreamResult(System.out));

how to use XPath to find the node value with CDATA tag in java

I used XPath to parse rss xml data, and the data is
<rss version="2.0">
<channel>
<title>
<![CDATA[sports news]]>
</title>
</channel>
</rss>
I want to get the text "sports news" using xpath "/rss/channel/title/text()" ,but the result is not what I want ,the real result is "\r\n",so how to found the result I want.
the code is below:
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(is);
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xPath = xpathFactory.newXPath();
Node node = (Node) xPath.evaluate("/rss/channel/title/text()", doc,XPathConstants.NODE);
String title = node.getNodeValue();
Try calling setCoalescing(true) on your DocumentBuilderFactory and this will collapse all CDATA/text nodes into single nodes.
You could try changing the XPath expression to
"string(/rss/channel/title)"
and use return type STRING instead of NODE:
Node node = (Node) xPath.evaluate("string(/rss/channel/title)", doc,
XPathConstants.STRING);
This way you are not selecting a text node, but rather the string value of the title element, which consists of the concatenation of all its descendant text nodes.

Document - How to get a tag's value by its name?

I'm using Java's DOM parser to parse an XML file.
let's say I have the following XML
<?xml version="1.0"?>
<config>
<dotcms>
<endPoint>ip</endPoint>
</dotcms>
</config>
</xml>
I like to get the value of 'endPoint'. I can do it with the following code snippet. (assuming that I already parsed it with DocumentBuilder)
NodeList nodeList = this.doc.getElementByTagName("dotcms");
Node nValue = (Node) nodeList.item(0);
return nValue.getNodeValue();
Is it possible to get a value of a field by a field's name? Like....
Node nValue = nodeList.getByName("endPoint") something like this...?
You should use XPath for these sorts of tasks:
//endPoint/text()
or:
/config/dotcms/endPoint/text()
Of course Java has a built-in support for XPath:
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//endPoint/text()");
Object value = expr.evaluate(doc, XPathConstants.STRING);
You could also use jOOX, a jquery-like DOM wrapper, to write even less code:
// Using css-style selectors
String text1 = $(document).find("endPoint").text();
// Using XPath
String text2 = $(document).xpath("//endPoint").text();

Categories