Java xPath - extract subdocument from XML

Java xPath - extract subdocument from XML - java

I have an XML document as follows:
<DocumentWrapper>
<DocumentHeader>
...
</DocumentHeader>
<DocumentBody>
<Invoice>
<Buyer/>
<Seller/>
</Invoice>
</DocumentBody>
</DocumentWrapper>
I would like to extract from it the content of DocumentBody element as String, raw XML document:
<Invoice>
<Buyer/>
<Seller/>
</Invoice>
With xPath it could be simple to get by:
/DocumentWrapper/DocumentBody
Unfrotunatelly, my Java code doesn't want to work as I want. It returns empty lines instead of expected result. Is there any chance to do that, or I have to return NodeList and then genereate xml document from them?
My Java code:
XPathFactory xPathFactoryXPathFactory.newInstance();
XPath xPath xPathFactory.newXPath();
XPathExpression xPath.compile(xPathQuery);
String result = expression.evaluate(xmlDocument);

Calling this method
String result = expression.evaluate(xmlDocument);
is the same as calling this
String result = (String) expression.evaluate(xmlDocument, XPathConstants.STRING);
which returns the character data of the result node, or the character data of all child nodes in case the result node is an element.
You should probably do something like this:
Node result = (Node) expression.evaluate(xmlDocument, XPathConstants.NODE);
TransformerFactory.newInstance().newTransformer()
.transform(new DOMSource(result), new StreamResult(System.out));

Related

CreateTextNode escape characters in large text string

charactersI am trying to include the correct characters in an XML document text node:
Element request = doc.createElement("requestnode");
request.appendChild(doc.createTextNode(xml));
rootElement.appendChild(request);
The xml string is a segment of a large xml file which I have read in:
Document doc = docBuilder.newDocument();
Element rootElement = doc.createElement("rootnode");
doc.appendChild(rootElement);
<firstname>John</firstname>
<dateOfBirth>28091999</dateOfBirth>
<surname>Doe</surname>
The problem is that passing this into createTextNode is replacing some of the charters:
<firstname>John</firstname>
<dateOfBirth>28091999</dateOfBirth>
<surname>Doe</surname>
Is there any way I can keep the correct characters (< , >) in the textnode. I have read about using importnode but this is not correctly XML, only a segment of a file.
Any help would be greatly appreciated.
EDIT: I need the xml string (which is not fully formatted xml, only a segment of an external xml file) to be in the "request node" as I am building XML to be imported into SOAP UI

You can't pass the element tag and text to the createTextNode() method. You only need to pass the text. You need then to append this text node to an element.
If the source is another XML document, you must extract the text node from an element and insert it in to the other. You can grab a Node (element and text) and try to inserted as a text node in the other. That is why you are seeing all the escape characters.
On the other hand, you can insert this Node into the other XML (if the structure is allowed) and it should be just fine.
In your context, I assume "request" is some sort of Node. The child element of a Node could be another element, text, etc. You have to be very specific.
You can do something like:
Element name = doc.createElement("name");
Element dob = doc.createElement("dateOfBirth");
Element surname = doc.createElement("surname");
name.appendChild( doc.createTextNode("John") );
dob.appendChild( doc.createTextNode("28091999") );
surname.appendChild( doc.createTextNode("Doe") );
Then you can add these element to a parent node:
node.appendChild(name);
node.appendChild(dob);
node.appendChild(surname);
UPDATE: As an alternative, you can open a stream to a document and insert your XML string as a byte stream. Something like this (untested code, but close):
String xmlString = "<firstname>John</firstname><dateOfBirth>28091999</dateOfBirth><surname>Doe</surname>";
DocumentBuilderFactory fac = javax.xml.parsers.DocumentBuilderFactory.newInstance();
DocumentBuilder builder = fac.newDocumentBuilder();
Document newDoc = builder.parse(new ByteArrayInputStream(xmlString.getBytes()));
Element newElem = doc.createElement("whatever");
doc.appendChild(newElem);
Node node = doc.importNode(newDoc.getDocumentElement(), true);
newElem.appendChild(node);
Something like that should do the trick.

Fetch node value containing escape characters using xpath

I am using Java XML API to fetch a node value for a given XPath.
Here is the code I am using to fetch the node value for a given XPath
final XPathFactory factory = XPathFactory.newInstance();
final XPath xpath1 = factory.newXPath();
xpath1.setNamespaceContext(new MyNamespaceContext());
InputSource inputSource = new InputSource(extractData(fileName));
inputSource.setEncoding("UTF-8");
String nodeValue = xpath1.evaluate(xpath, inputSource);
The xml file contains the node value as Some>data
The node value I am expecting is Some>data but the value that is returned from the above code is Some>data
Can any one help how to change the above code so that i get the node data as Some>data

How to retrieve a specific node's value in XPath?

I have a XML file with this format:
<object>
<origin>1:1:1</origin>
<normal>2:2:2</normal>
<leafs>
<object>
<origin>1:1:1</origin>
<normal>3:3:3</normal>
<leafs>none</leafs>
</object>
</leafs>
</object>
How could I retrieve the value "none" of element <leafs> on second level of the tree? I used this
XPathExpression expLeafs = xpath.compile("*[name()='leafs']");
Object resLeafs = expLeafs.evaluate(node, XPathConstants.NODESET);
NodeList leafsList = (NodeList) resLeafs;
if (!leafsList.item(0).getFirstChild().getNodeValue().equals("none"))
more code...
but it doesn't work because there are some empty text nodes bofore and after "none". Is there a way to deal with it like xpath.compile("*[value()='none']")?

I just ran a simple test program using your XML file and
expr = xpath.compile("/object/leafs/object/leafs/text()");
and got the desired "none" result. If you have additional requirements, you'll have to edit your question.

After a checking the code line #Lord Torgamus provided i managed to parse the document as i needed like this:
XPathExpression expLeafs = xpath.compile("*[name()='leafs']");
Object resLeafs = expLeafs.evaluate(node, XPathConstants.NODESET);
NodeList leafsList = (NodeList) resLeafs;
Node nd = leafsList.item(0);
XPathExpression expr = xpath.compile("text()");
Object resultObj = expr.evaluate(nd, XPathConstants.NODE);
String str = expr.evaluate(nd).trim();
System.out.println(str);
and the output is "none" with no other empty text node.

how to use XPath to find the node value with CDATA tag in java

I used XPath to parse rss xml data, and the data is
<rss version="2.0">
<channel>
<title>
<![CDATA[sports news]]>
</title>
</channel>
</rss>
I want to get the text "sports news" using xpath "/rss/channel/title/text()" ,but the result is not what I want ,the real result is "\r\n",so how to found the result I want.
the code is below:
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(is);
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xPath = xpathFactory.newXPath();
Node node = (Node) xPath.evaluate("/rss/channel/title/text()", doc,XPathConstants.NODE);
String title = node.getNodeValue();

Try calling setCoalescing(true) on your DocumentBuilderFactory and this will collapse all CDATA/text nodes into single nodes.

You could try changing the XPath expression to
"string(/rss/channel/title)"
and use return type STRING instead of NODE:
Node node = (Node) xPath.evaluate("string(/rss/channel/title)", doc,
XPathConstants.STRING);
This way you are not selecting a text node, but rather the string value of the title element, which consists of the concatenation of all its descendant text nodes.

Document - How to get a tag's value by its name?

I'm using Java's DOM parser to parse an XML file.
let's say I have the following XML
<?xml version="1.0"?>
<config>
<dotcms>
<endPoint>ip</endPoint>
</dotcms>
</config>
</xml>
I like to get the value of 'endPoint'. I can do it with the following code snippet. (assuming that I already parsed it with DocumentBuilder)
NodeList nodeList = this.doc.getElementByTagName("dotcms");
Node nValue = (Node) nodeList.item(0);
return nValue.getNodeValue();
Is it possible to get a value of a field by a field's name? Like....
Node nValue = nodeList.getByName("endPoint") something like this...?

You should use XPath for these sorts of tasks:
//endPoint/text()
or:
/config/dotcms/endPoint/text()
Of course Java has a built-in support for XPath:
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//endPoint/text()");
Object value = expr.evaluate(doc, XPathConstants.STRING);

You could also use jOOX, a jquery-like DOM wrapper, to write even less code:
// Using css-style selectors
String text1 = $(document).find("endPoint").text();
// Using XPath
String text2 = $(document).xpath("//endPoint").text();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java xPath - extract subdocument from XML - java

Related

CreateTextNode escape characters in large text string

Fetch node value containing escape characters using xpath

How to retrieve a specific node's value in XPath?

how to use XPath to find the node value with CDATA tag in java

Document - How to get a tag's value by its name?

Categories

Resources